Connect with us

News

Apple’s Third-Generation Foundation Models Reach for Google Cloud

Apple’s third-generation Foundation Models split across five systems, with the heaviest on Google Cloud and NVIDIA chips under Private Cloud Compute.

Published

on

Apple’s third-generation Foundation Models landed at WWDC26 as a family of five models that split the work between on-device and server inference, and the most capable of the five runs on Google Cloud and NVIDIA hardware for the first time. The new lineup is what the refreshed Apple Intelligence stack runs on, and it places a single piece of Apple’s 2024-era privacy architecture onto a new set of hardware.

Four of the five models still run on Apple silicon, on the device or in Apple’s own data centers. The fifth, AFM 3 Cloud Pro, is the heaviest one, and it is the only piece of Apple Intelligence that now lives on rented servers. Apple’s security team says the same five privacy rules from 2024 still apply, but they are now enforced on a different set of chips.

The Five Models and Where Each One Runs

Apple calls the new lineup a family of five foundation models “custom-built in collaboration with Google,” spanning on-device systems and Private Cloud Compute servers. The descriptions come from Apple’s own third-generation Foundation Models announcement, with the cloud Pro model living in a different part of the stack than the other four.

The lineup is built for elasticity. Two models live on the device so everyday work stays on the user’s hardware. Three live in Private Cloud Compute so the heaviest queries can run at server scale. Four of the five are tuned for Apple silicon, and only the top model is optimized for NVIDIA GPUs in Google Cloud.

Model Job Where it runs
AFM 3 Core 3B dense on-device language model, the everyday workhorse On device (Apple silicon)
AFM 3 Core Advanced 20B sparse multimodal model, expressive voices and dictation On device (top Apple silicon systems)
AFM 3 Cloud Server workhorse for general text and reasoning Private Cloud Compute (Apple silicon)
ADM 3 Cloud (Image) Image generation, editing, Image Playground, Genmoji Private Cloud Compute (Apple silicon)
AFM 3 Cloud Pro Agentic tool use and complex reasoning NVIDIA GPUs in Google Cloud, under PCC

Apple silicon handles everything that does not need the full weight of the largest model, and the most demanding jobs are the only ones that touch third-party hardware. The five PCC rules from 2024 still apply on Google Cloud, and Apple’s security team says the keys stay with Apple.

AFM 3 Cloud Pro Is the First Apple Model to Leave Apple’s Data Centers

When Apple introduced Private Cloud Compute in 2024, it ran exclusively on Apple silicon in Apple’s own data centers, with the architecture meant to deliver cloud AI under the same privacy guarantees users got on the device, and third-party researchers able to verify those guarantees. The third-generation lineup breaks that mold in one specific place.

AFM 3 Cloud Pro is the first piece of Apple Intelligence to run on hardware Apple does not own. Apple and Google worked together to extend PCC to NVIDIA GPUs in Google Cloud, and the move required new attestation plumbing, new transparency tooling, and new hardware roots of trust. The same five privacy rules from 2024 still apply, but they are now enforced on a different set of chips. The new set is NVIDIA Confidential Computing, Intel CPUs with TDX, and Google’s Titan security chip.

Apple’s security team is direct about who holds the keys. The company says Apple devices will only trust PCC software that is cryptographically approved by Apple, and that Apple retains complete control over the PCC software stack regardless of where the infrastructure is hosted. The keys stay with Apple, the binaries stay public, and the same depth of access through the Apple Security Bounty Program carries over to the new build.

Apple’s framing on its security research blog is that the move adds capacity, not exposure.

By the numbers

  • 5 models in the AFM 3 family
  • 4 of the 5 run on Apple silicon (on device or in Apple’s data centers)
  • 1 model, AFM 3 Cloud Pro, runs on NVIDIA GPUs in Google Cloud
  • 20 billion parameters in the on-device flagship, AFM 3 Core Advanced
  • 1 to 4 billion parameters activated per request in AFM 3 Core Advanced

How a 20-Billion-Parameter Model Fits on a Phone

AFM 3 Core Advanced is Apple’s most powerful on-device model, and it is unusual for a model of its size. Most on-device large language models aimed at the general public sit in the low-single-digit billions of parameters. Apple pushed the on-device flagship to 20 billion parameters, a number that would normally swamp a phone’s memory.

The model gets there by storing its full weight in flash storage and pulling only a small subset of “experts” into fast memory per request, using a technique Apple calls Instruction-Following Pruning and detailed in a study published a year ago. The architecture is conceptually close to a Mixture of Experts design, but the routing happens once per prompt rather than per token, and a fixed set of always-active shared experts sits alongside the input-dependent ones.

  1. The full 20B-parameter model lives in flash storage (NAND), not in active memory.
  2. Per prompt, a small, fixed set of experts is selected and patched into DRAM with shared, always-active weights.
  3. The selected experts are periodically reselected and updated during generation, not token by token.
  4. Activation scales from 1 billion to 4 billion parameters depending on the task, giving the model elasticity without burning full DRAM on every request.

Apple says this is what makes the model usable on top Apple silicon systems. The activation stays small even when the total does not, so a phone can run heavy multimodal features without keeping the whole 20 billion parameters live. The same architecture is detailed in Apple’s third-generation Foundation Models announcement.

The Trust Stack Apple Built for the Rented Servers

The 2024 launch of Private Cloud Compute set five load-bearing rules for cloud AI, and the 2026 expansion keeps all five intact, though the implementation changes. On Google Cloud, the chain of trust now runs through three vendors, not one.

NVIDIA Confidential Computing seals a running job inside the GPU. Intel’s TDX does the same on the CPU, locking a slice of memory so the host operating system and operators stay shut out. Google’s Titan chip is the root of trust, the part that proves a server is the genuine, Apple-approved article before any work begins. Apple holds the signing keys across the entire stack, the company says, and treats every component from firmware through the host and guest operating systems as part of the verified trust base.

The security team is also explicit about not relying on confidential computing alone. Apple says it does not lean solely on confidential computing technologies to mitigate attacks that exploit privileged access outside a confidential VM, including side-channel attacks, and treats every layer from firmware to application code as part of its trusted computing base, subject to verifiable transparency and no-privileged-access guarantees. The Private Cloud Compute rules, carried over from the Apple silicon build, are below.

  • Stateless computation: the server holds user data only long enough to answer, then wipes it
  • Enforceable guarantees: the promises live in hardware and software, not in policy
  • No privileged runtime access: Apple and Google staff stay locked out of a running job
  • Non-targetability: aiming an attack at one specific user’s request stays out of reach
  • Verifiable transparency: the code ships publicly so outside researchers can inspect it

Apple is also extending the auditing hooks. The company says it will publish all PCC binaries for public inspection. It will provide live research tooling. Security researchers will get access to live PCC nodes through the Apple Security Bounty Program, with the same depth of access it already offers for the Apple silicon build, and the technical details of the move are in Apple’s security write-up on expanding Private Cloud Compute.

What Apple Says the New Models Can Do

Apple ran side-by-side human evaluations of the new lineup against its 2025 predecessors, with in-house reviewers grading responses across instruction following, truthfulness, presentation, and image understanding. The numbers are Apple’s, not an outside benchmark, and Apple flags them as work-in-progress figures that will keep shifting through the beta.

For general text, AFM 3 Core was preferred over its 2025 baseline on 45.6 percent of prompts, against 23.3 percent for the prior model. The server-side AFM 3 Cloud was preferred on 64.7 percent of prompts versus 8.7 percent for the 2025 server model. Image understanding on AFM 3 Cloud came in at 37.8 percent preferred versus 9.6 percent for its predecessor.

Comparison New model preferred Prior model preferred
AFM 3 Core vs 2025, general text 45.6% 23.3%
AFM 3 Cloud vs 2025 server, general text 64.7% 8.7%
AFM 3 Cloud vs 2025, image understanding 37.8% 9.6%

On voices, AFM 3 Core Advanced scored 4.15 on a 5-point Mean Opinion Scale for overall quality, and 4.24 on conversational text, compared to 3.82 for the current production text-to-speech system in conversational use. Apple notes that a 0.1 change on that scale is something customers tend to notice, and AFM 3 Cloud Pro then sits another step above AFM 3 Cloud, with a roughly 10 percent relative gain on text satisfaction, a 14 percent relative gain on image understanding, and a 14 percent edge on math.

What This Means for Siri and Everyday Use

The new lineup powers the refreshed Apple Intelligence. The on-device 20B model handles expressive voices and dictation, and ADM 3 Cloud (Image) drives the rebuilt Image Playground and a new Spatial Reframing feature in Photos. AFM 3 Cloud Pro is reserved for the heaviest work, including the agentic tool use and complex reasoning the new Siri is meant to handle.

Apple has also widened the Foundation Models framework on the developer side, with image input alongside text and a route to bring in custom model weights. The technical detail is in Apple’s WWDC26 session on what’s new in the Foundation Models framework.

The “Gemini inside Apple Intelligence” framing is also narrower than the rumors made it sound. The Gemini family was used to build and train the new models, not to power them at inference time, and the recap of how Google fit into the new model training is blunt that the end result is pure Apple technology and code, with a user never touching Google code or Gemini agents when using Apple Intelligence. The wider rollout context, including the dedicated Siri app and the Dynamic Island result panel, sits in the previous Siri reboot and Gemini rollout, and the EU regulatory dimension, where Brussels and Apple disagree on why the new Siri has not yet shipped in the European Union, is in the EU’s view of the Siri AI delay.

Siri gets a real conversation history, awareness of on-screen content, and a dedicated app, with dictation improving on punctuation, casing, and capturing intent, and voices sounding more natural, particularly on casual text read aloud, while Photos gains Spatial Reframing, which can re-compose a shot after it has been taken. The heavy lifting on the largest model happens on Google hardware, with the design wiping the data once the answer is sent, and Apple says the full set of PCC protections on Google Cloud will gradually ramp toward the complete set throughout the summer preview period.

Frequently Asked Questions

How is this different from the first-generation Apple Foundation Models from 2024?

The 2024 lineup was a 3 billion-parameter on-device model plus a larger server-based model, both running on Apple silicon. The third generation ships five models, with a 20 billion-parameter on-device flagship, three server-side models, and the heaviest one running on NVIDIA GPUs in Google Cloud under Private Cloud Compute. The sparse on-device architecture stores its full weight in flash and activates 1 to 4 billion parameters per request.

Which devices can run AFM 3 Core Advanced?

AFM 3 Core Advanced is unlocked by and optimized for Apple’s most capable Apple silicon systems, the company says. The sparse 20B model relies on the top of the Apple silicon lineup on the user’s device, so lighter devices fall back to AFM 3 Core for on-device work.

Does the Gemini partnership mean Apple Intelligence is now a Google product?

No. Apple built the third-generation Apple Foundation Models in collaboration with Google. The Gemini family was used for training and distillation, but the models themselves are pure Apple. Apple says the end result is Apple code, and a user never touches Google code or Gemini agents when using Apple Intelligence.

Is my data actually safe on Google’s servers under PCC?

Apple’s security team says yes. The five Private Cloud Compute rules from 2024 carry over to the Google Cloud build, with stateless computation, no privileged runtime access, non-targetability, enforceable guarantees, and verifiable transparency, and Apple also publishes the binaries and gives researchers live access to PCC nodes through the Apple Security Bounty Program.

When do users get to try AFM 3 Cloud Pro?

AFM 3 Cloud Pro is part of the third-generation Apple Foundation Models announced at WWDC26 on June 8, 2026. Apple says the full set of PCC protections on Google Cloud will gradually ramp toward the complete set throughout the summer preview period. Broader technical detail is promised at the Confidential Computing Summit later this month.

I’m a creative thinker, writer, and social media professional who loves sharing tips and ideas to help small businesses grow. My mission is to empower business owners with the knowledge they need to succeed online. I’m passionate about the internet and social media and want to share what I know with others to help them navigate the waters of online business, marketing, and blogging.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending