Llama 4 Scout 109B MoE

Meta's next-generation mixture-of-experts model — frontier-class capability at MoE efficiency, requiring 64GB+ RAM.

109B MoE

parameters

64GB

minimum RAM

Overview

What makes Llama 4 Scout 109B MoE notable

Llama 4 Scout is Meta's first mixture-of-experts (MoE) model in the Llama family. Despite having 109B total parameters, it only activates a fraction per token — making it significantly more efficient per inference step than a comparable dense model of the same size.

MoE architecture means it achieves frontier-class quality while running faster than you'd expect from the parameter count. On Apple Silicon with 64GB+ unified memory, it delivers GPT-4 Turbo-class reasoning, nuanced conversation, and complex creative tasks.

The trade-off is hardware: Llama 4 Scout requires 64GB of RAM, which means Mac Studio M4 Max (128GB) or Mac Studio M3 Ultra. It's the right choice for power users who've invested in top-tier hardware and want the best local model available.

Best use cases

What it excels at

✓Frontier-class reasoning for complex, multi-step analysis
✓Extended context tasks requiring deep comprehension
✓Sophisticated creative writing and ideation
✓High-stakes professional document review
✓Research synthesis across long, complex source material
✓Complex coding tasks requiring architectural understanding

Compatibility

Hardware requirements

Mac model	RAM	Performance	Notes
Mac Studio M4 Max	128GB	Good	Q4/Q5 quantization — minimum spec for this model
Mac Studio M3 Ultra	192GB+	Optimal	Q8 full precision — run multiple models simultaneously

Speed

Approximate tokens/second

Mac Studio M4 Max 128GB~25 tok/s

Mac Studio M3 Ultra 192GB+~60 tok/s

Use case fit

Quality ratings

Chat★★★★★

Coding★★★★★

Reasoning★★★★★

Creative Writing★★★★★

Document Analysis★★★★★

Cost comparison

Without local AI, the equivalent capability costs:

Cloud equivalent

GPT-4o Turbo / Claude 3.5 Sonnet

~$200–330/moper month

Local with Maai Machines

Llama 4 Scout 109B MoE

$0per month

~$10/month electricity. One-time setup.

Run Llama 4 Scout 109B MoE on your own hardware.

Book a consultation. We'll configure this model — and the rest of your stack — in one day.

Book a Consultation ← All models