News

Apple’s Third-Generation Foundation Models Reach for Google Cloud

Q: Which devices can run AFM 3 Core Advanced?

AFM 3 Core Advanced is unlocked by and optimized for Apple's most capable Apple silicon systems, the company says. The sparse 20B model relies on the top of the Apple silicon lineup on the user's device, so lighter devices fall back to AFM 3 Core for on-device work.

Apple’s third-generation Foundation Models split across five systems, with the heaviest on Google Cloud and NVIDIA chips under Private Cloud Compute.

Published

1 month ago

June 12, 2026

Henry Fox

Apple’s third-generation Foundation Models landed at WWDC26 as a family of five models that split the work between on-device and server inference, and the most capable of the five runs on Google Cloud and NVIDIA hardware for the first time. The new lineup is what the refreshed Apple Intelligence stack runs on, and it places a single piece of Apple’s 2024-era privacy architecture onto a new set of hardware.

Four of the five models still run on Apple silicon, on the device or in Apple’s own data centers. The fifth, AFM 3 Cloud Pro, is the heaviest one, and it is the only piece of Apple Intelligence that now lives on rented servers. Apple’s security team says the same five privacy rules from 2024 still apply, but they are now enforced on a different set of chips.

The Five Models and Where Each One Runs

Apple calls the new lineup a family of five foundation models “custom-built in collaboration with Google,” spanning on-device systems and Private Cloud Compute servers. The descriptions come from Apple’s own third-generation Foundation Models announcement, with the cloud Pro model living in a different part of the stack than the other four.

The lineup is built for elasticity. Two models live on the device so everyday work stays on the user’s hardware. Three live in Private Cloud Compute so the heaviest queries can run at server scale. Four of the five are tuned for Apple silicon, and only the top model is optimized for NVIDIA GPUs in Google Cloud.

Model	Job	Where it runs
AFM 3 Core	3B dense on-device language model, the everyday workhorse	On device (Apple silicon)
AFM 3 Core Advanced	20B sparse multimodal model, expressive voices and dictation	On device (top Apple silicon systems)
AFM 3 Cloud	Server workhorse for general text and reasoning	Private Cloud Compute (Apple silicon)
ADM 3 Cloud (Image)	Image generation, editing, Image Playground, Genmoji	Private Cloud Compute (Apple silicon)
AFM 3 Cloud Pro	Agentic tool use and complex reasoning	NVIDIA GPUs in Google Cloud, under PCC

Apple silicon handles everything that does not need the full weight of the largest model, and the most demanding jobs are the only ones that touch third-party hardware. The five PCC rules from 2024 still apply on Google Cloud, and Apple’s security team says the keys stay with Apple.

Apple's third-generation Foundation Models architecture and lineup — Apple’s third-generation Foundation Models architecture and lineup

AFM 3 Cloud Pro Is the First Apple Model to Leave Apple’s Data Centers

When Apple introduced Private Cloud Compute in 2024, it ran exclusively on Apple silicon in Apple’s own data centers, with the architecture meant to deliver cloud AI under the same privacy guarantees users got on the device, and third-party researchers able to verify those guarantees. The third-generation lineup breaks that mold in one specific place.

AFM 3 Cloud Pro is the first piece of Apple Intelligence to run on hardware Apple does not own. Apple and Google worked together to extend PCC to NVIDIA GPUs in Google Cloud, and the move required new attestation plumbing, new transparency tooling, and new hardware roots of trust. The same five privacy rules from 2024 still apply, but they are now enforced on a different set of chips. The new set is NVIDIA Confidential Computing, Intel CPUs with TDX, and Google’s Titan security chip.

Apple’s security team is direct about who holds the keys. The company says Apple devices will only trust PCC software that is cryptographically approved by Apple, and that Apple retains complete control over the PCC software stack regardless of where the infrastructure is hosted. The keys stay with Apple, the binaries stay public, and the same depth of access through the Apple Security Bounty Program carries over to the new build.

Apple’s framing on its security research blog is that the move adds capacity, not exposure.

By the numbers

5 models in the AFM 3 family
4 of the 5 run on Apple silicon (on device or in Apple’s data centers)
1 model, AFM 3 Cloud Pro, runs on NVIDIA GPUs in Google Cloud
20 billion parameters in the on-device flagship, AFM 3 Core Advanced
1 to 4 billion parameters activated per request in AFM 3 Core Advanced

How a 20-Billion-Parameter Model Fits on a Phone

AFM 3 Core Advanced is Apple’s most powerful on-device model, and it is unusual for a model of its size. Most on-device large language models aimed at the general public sit in the low-single-digit billions of parameters. Apple pushed the on-device flagship to 20 billion parameters, a number that would normally swamp a phone’s memory.

The model gets there by storing its full weight in flash storage and pulling only a small subset of “experts” into fast memory per request, using a technique Apple calls Instruction-Following Pruning and detailed in a study published a year ago. The architecture is conceptually close to a Mixture of Experts design, but the routing happens once per prompt rather than per token, and a fixed set of always-active shared experts sits alongside the input-dependent ones.

The full 20B-parameter model lives in flash storage (NAND), not in active memory.
Per prompt, a small, fixed set of experts is selected and patched into DRAM with shared, always-active weights.
The selected experts are periodically reselected and updated during generation, not token by token.
Activation scales from 1 billion to 4 billion parameters depending on the task, giving the model elasticity without burning full DRAM on every request.

Apple says this is what makes the model usable on top Apple silicon systems. The activation stays small even when the total does not, so a phone can run heavy multimodal features without keeping the whole 20 billion parameters live. The same architecture is detailed in Apple’s third-generation Foundation Models announcement.

The Trust Stack Apple Built for the Rented Servers

The 2024 launch of Private Cloud Compute set five load-bearing rules for cloud AI, and the 2026 expansion keeps all five intact, though the implementation changes. On Google Cloud, the chain of trust now runs through three vendors, not one.

NVIDIA Confidential Computing seals a running job inside the GPU. Intel’s TDX does the same on the CPU, locking a slice of memory so the host operating system and operators stay shut out. Google’s Titan chip is the root of trust, the part that proves a server is the genuine, Apple-approved article before any work begins. Apple holds the signing keys across the entire stack, the company says, and treats every component from firmware through the host and guest operating systems as part of the verified trust base.

The security team is also explicit about not relying on confidential computing alone. Apple says it does not lean solely on confidential computing technologies to mitigate attacks that exploit privileged access outside a confidential VM, including side-channel attacks, and treats every layer from firmware to application code as part of its trusted computing base, subject to verifiable transparency and no-privileged-access guarantees. The Private Cloud Compute rules, carried over from the Apple silicon build, are below.

Stateless computation: the server holds user data only long enough to answer, then wipes it
Enforceable guarantees: the promises live in hardware and software, not in policy
No privileged runtime access: Apple and Google staff stay locked out of a running job
Non-targetability: aiming an attack at one specific user’s request stays out of reach
Verifiable transparency: the code ships publicly so outside researchers can inspect it

Apple is also extending the auditing hooks. The company says it will publish all PCC binaries for public inspection. It will provide live research tooling. Security researchers will get access to live PCC nodes through the Apple Security Bounty Program, with the same depth of access it already offers for the Apple silicon build, and the technical details of the move are in Apple’s security write-up on expanding Private Cloud Compute.

What Apple Says the New Models Can Do

Apple ran side-by-side human evaluations of the new lineup against its 2025 predecessors, with in-house reviewers grading responses across instruction following, truthfulness, presentation, and image understanding. The numbers are Apple’s, not an outside benchmark, and Apple flags them as work-in-progress figures that will keep shifting through the beta.

For general text, AFM 3 Core was preferred over its 2025 baseline on 45.6 percent of prompts, against 23.3 percent for the prior model. The server-side AFM 3 Cloud was preferred on 64.7 percent of prompts versus 8.7 percent for the 2025 server model. Image understanding on AFM 3 Cloud came in at 37.8 percent preferred versus 9.6 percent for its predecessor.

Comparison	New model preferred	Prior model preferred
AFM 3 Core vs 2025, general text	45.6%	23.3%
AFM 3 Cloud vs 2025 server, general text	64.7%	8.7%
AFM 3 Cloud vs 2025, image understanding	37.8%	9.6%

On voices, AFM 3 Core Advanced scored 4.15 on a 5-point Mean Opinion Scale for overall quality, and 4.24 on conversational text, compared to 3.82 for the current production text-to-speech system in conversational use. Apple notes that a 0.1 change on that scale is something customers tend to notice, and AFM 3 Cloud Pro then sits another step above AFM 3 Cloud, with a roughly 10 percent relative gain on text satisfaction, a 14 percent relative gain on image understanding, and a 14 percent edge on math.

What This Means for Siri and Everyday Use

The new lineup powers the refreshed Apple Intelligence. The on-device 20B model handles expressive voices and dictation, and ADM 3 Cloud (Image) drives the rebuilt Image Playground and a new Spatial Reframing feature in Photos. AFM 3 Cloud Pro is reserved for the heaviest work, including the agentic tool use and complex reasoning the new Siri is meant to handle.

Apple has also widened the Foundation Models framework on the developer side, with image input alongside text and a route to bring in custom model weights. The technical detail is in Apple’s WWDC26 session on what’s new in the Foundation Models framework.

The “Gemini inside Apple Intelligence” framing is also narrower than the rumors made it sound. The Gemini family was used to build and train the new models, not to power them at inference time, and the recap of how Google fit into the new model training is blunt that the end result is pure Apple technology and code, with a user never touching Google code or Gemini agents when using Apple Intelligence. The wider rollout context, including the dedicated Siri app and the Dynamic Island result panel, sits in the previous Siri reboot and Gemini rollout, and the EU regulatory dimension, where Brussels and Apple disagree on why the new Siri has not yet shipped in the European Union, is in the EU’s view of the Siri AI delay.

Siri gets a real conversation history, awareness of on-screen content, and a dedicated app, with dictation improving on punctuation, casing, and capturing intent, and voices sounding more natural, particularly on casual text read aloud, while Photos gains Spatial Reframing, which can re-compose a shot after it has been taken. The heavy lifting on the largest model happens on Google hardware, with the design wiping the data once the answer is sent, and Apple says the full set of PCC protections on Google Cloud will gradually ramp toward the complete set throughout the summer preview period.

Frequently Asked Questions

How is this different from the first-generation Apple Foundation Models from 2024?

The 2024 lineup was a 3 billion-parameter on-device model plus a larger server-based model, both running on Apple silicon. The third generation ships five models, with a 20 billion-parameter on-device flagship, three server-side models, and the heaviest one running on NVIDIA GPUs in Google Cloud under Private Cloud Compute. The sparse on-device architecture stores its full weight in flash and activates 1 to 4 billion parameters per request.

Which devices can run AFM 3 Core Advanced?

AFM 3 Core Advanced is unlocked by and optimized for Apple’s most capable Apple silicon systems, the company says. The sparse 20B model relies on the top of the Apple silicon lineup on the user’s device, so lighter devices fall back to AFM 3 Core for on-device work.

Does the Gemini partnership mean Apple Intelligence is now a Google product?

No. Apple built the third-generation Apple Foundation Models in collaboration with Google. The Gemini family was used for training and distillation, but the models themselves are pure Apple. Apple says the end result is Apple code, and a user never touches Google code or Gemini agents when using Apple Intelligence.

Is my data actually safe on Google’s servers under PCC?

Apple’s security team says yes. The five Private Cloud Compute rules from 2024 carry over to the Google Cloud build, with stateless computation, no privileged runtime access, non-targetability, enforceable guarantees, and verifiable transparency, and Apple also publishes the binaries and gives researchers live access to PCC nodes through the Apple Security Bounty Program.

When do users get to try AFM 3 Cloud Pro?

AFM 3 Cloud Pro is part of the third-generation Apple Foundation Models announced at WWDC26 on June 8, 2026. Apple says the full set of PCC protections on Google Cloud will gradually ramp toward the complete set throughout the summer preview period. Broader technical detail is promised at the Confidential Computing Summit later this month.