Meta Introduces Llama 4 Scout and Maverick AI Models
  • 526 views
  • 2 min read

Meta has unveiled the Llama 4 family of AI models, introducing Llama 4 Scout and Llama 4 Maverick. These models represent a significant leap forward in open-source generative AI, combining multimodality and a Mixture of Experts (MoE) architecture for enhanced performance and efficiency.

Llama 4 Scout is a multimodal model with 17 billion active parameters and 16 experts. It is designed to be efficient, fitting on a single NVIDIA H100 GPU, and boasts an industry-leading context window of 10 million tokens. This large context window allows Scout to perform tasks such as multi-document summarization, parsing user activity for personalized tasks, and reasoning over vast codebases. Meta claims that Llama 4 Scout outperforms models like Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a range of benchmarks. It is pre-trained and post-trained with a 256K context length, empowering it with advanced length generalization capability. A key innovation in its architecture is the use of interleaved attention layers without positional embeddings, along with inference time temperature scaling of attention to enhance length generalization. This model is available on Workers AI.

Llama 4 Maverick, also featuring 17 billion active parameters, utilizes a larger set of 128 experts within its MoE architecture. This model is designed for a best-in-class performance-to-cost ratio and excels in image and text understanding across 12 languages. Meta positions Maverick as a workhorse for general assistant and chat applications, highlighting its capabilities in precise image understanding and creative writing. Maverick beats GPT-4o and Gemini 2.0 Flash across several benchmarks and achieves comparable results to DeepSeek v3 on reasoning and coding, using less than half the active parameters. An experimental chat version of Llama 4 Maverick scores an ELO of 1417 on LMArena.

Both Llama 4 Scout and Llama 4 Maverick are the first open-weight, natively multimodal models built using a Mixture of Experts (MoE) architecture. In MoE models, only a fraction of the total parameters are activated for each token, making them more compute-efficient for both training and inference. Llama 4 Maverick, for example, has 17 billion active parameters but 400 billion total parameters. The MoE layers use 128 routed experts and a shared expert.

Meta has also previewed Llama 4 Behemoth, a larger model with 288 billion active parameters that is still in training. The company claims that Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM-focused benchmarks like MATH-500 and GPQA Diamond. This model is intended to serve as a teacher model for the other Llama 4 models. CEO Mark Zuckerberg has also mentioned that there will be a Llama 4 Reasoning model coming in the next month.

The Llama 4 models are available on various platforms, including Meta AI (for WhatsApp, Messenger, and Instagram Direct), the Llama website, and Hugging Face. Amazon Web Services (AWS) has announced the availability of Llama 4 Scout and Llama 4 Maverick on Amazon SageMaker JumpStart, with availability as fully managed, serverless models in Amazon Bedrock coming soon. NVIDIA has optimized both Llama 4 Scout and Llama 4 Maverick for NVIDIA TensorRT-LLM and will package the Llama 4 models as NVIDIA NIM microservices for easy deployment on any GPU-accelerated infrastructure.

The models were trained on diverse datasets, including text, images, and videos, using techniques like MetaP and FP8 precision to boost quality and efficiency. They support over 200 languages and are compatible with platforms like WhatsApp, Messenger, and Instagram Direct.


Writer - Rohan Sharma
Rohan Sharma is a seasoned tech news writer with a keen knack for identifying and analyzing emerging technologies. He's highly sought-after in tech journalism due to his unique ability to distill complex technical information into concise and engaging narratives. Rohan consistently makes intricate topics accessible, providing readers with clear, insightful perspectives on the cutting edge of innovation.
Advertisement

Latest Post


Infosys is strategically leveraging its "poly-AI" or hybrid AI architecture to deliver significant manpower savings, potentially up to 35%, for its clients across various industries. This approach involves seamlessly integrating various AI solutions,...
  • 426 views
  • 3 min

Indian startups have displayed significant growth in funding, securing $338 million, marking a substantial 65% year-over-year increase. This surge reflects renewed investor confidence in the Indian startup ecosystem and its potential for sustainable...
  • 225 views
  • 3 min

Cohere, a Canadian AI start-up, has reached a valuation of $6. 8 billion after securing $500 million in a recent funding round. This investment will help Cohere accelerate its agentic AI offerings. The funding round was led by Radical Ventures and In...
  • 320 views
  • 2 min

The Indian Institute of Technology Hyderabad (IIT-H) has made significant strides in autonomous vehicle technology, developing a driverless vehicle system through its Technology Innovation Hub on Autonomous Navigation (TiHAN). This initiative marks ...
  • 377 views
  • 2 min

Advertisement

About   •   Terms   •   Privacy
© 2025 TechScoop360