There’s a question that comes up whenever someone proposes an AI feature in a product meeting: “What happens when the user has no connection?” For most AI tools, the honest answer is: nothing works. Google AI Edge Gallery is betting that answer is about to change.
What it is
Google AI Edge Gallery is an open source app — available on Android and iOS — that lets you download and run language models directly on your device. No API calls. No cloud backend. No data leaving your phone. Once you’ve downloaded the model, you can turn on airplane mode and inference keeps working exactly the same.
The app launched at Google I/O 2025 alongside the Gemma 3n preview. In its first two months it hit 500,000 APK downloads — which says a lot about the appetite in the community for on-device AI.
The Gemma 4 update
The main novelty of the latest release is full support for the Gemma 4 model family. The proposition is to run advanced reasoning, logic, and creative capabilities without sending any data to a server. The family comes in four sizes:
- Gemma 4 1B — optimized for mobile phones, fast inference, lower accuracy
- Gemma 4 4B — the sweet spot for most modern Android devices with 8GB+ of RAM
- Gemma 4 12B — more powerful reasoning, requires a capable device
- Gemma 4 27B — near-frontier quality, requires laptop-class hardware
The update also brings Thinking Mode — a feature that lets you see the model’s reasoning process in real time, for now exclusive to models in the Gemma 4 family.
Agent Skills: the interesting part
This is where things get genuinely new. Agent Skills is one of the first implementations of autonomous multi-step agentic workflows that run entirely on-device. Powered by Gemma 4, it lets you extend the base LLM with modular tools: Wikipedia for fact-grounding, interactive maps, visual summary cards, and more. You can load skills from the community via a URL, or build your own.
Think of it like function calling — but running completely on your hardware, with no internet dependency once the model is downloaded.
Model management and hardware reality
The app is integrated with Hugging Face, which lets you explore hundreds of models and download them directly. You can also import your own models if you already have them on the device.
Hardware matters. Devices with dedicated NPU — like Qualcomm Snapdragon 8 Gen 2 or newer, or Google Tensor chips — run inference noticeably faster and with better battery efficiency than those relying on CPU alone. As for runtime, LiteRT-LM can run Gemma 4 E2B using less than 1.5 GB of memory on some devices, thanks to support for 2 and 4-bit weight quantization.
For devs who want to go beyond the app, the litert-lm CLI is available on Linux, macOS, and Raspberry Pi, and lets you experiment with Gemma 4 without writing any code.
Is edge AI ready for production?
The honest answer is: it depends on the use case and target device. For consumer apps targeting mid-to-high-end Android with Gemma 4 4B, you’re in genuinely usable territory. For anything requiring the full reasoning capability of a frontier model, you still need the cloud.
But the threshold is moving. The fact that Agent Skills — autonomous multi-step workflows — run completely on-device is a real milestone. And for use cases where privacy is non-negotiable or connectivity is unreliable, the value proposition is already there.
How to get started
- Android / iOS: Search for “Google AI Edge Gallery” in your app store
- GitHub: GitHub - google-ai-edge/gallery: A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally. · GitHub
- LiteRT-LM CLI: To experiment from the terminal on Linux, macOS, or Raspberry Pi
