Microsoft has introduced Fara-7B, a new, efficient language model designed to operate computer applications directly. This compact model, with only 7 billion parameters, is designed to perform tasks by visually perceiving a webpage and taking actions such as scrolling, typing, and clicking on predicted coordinates, much like a human user. Unlike traditional chat models that generate text-based responses, Fara-7B is a Computer Use Agent (CUA) that leverages computer interfaces to complete tasks on behalf of users.
Key Features and Capabilities
Fara-7B distinguishes itself through its ability to run directly on devices, reducing latency and improving user privacy, as data remains local. This is in contrast to many existing systems that rely on large, multimodal models requiring server-side deployment. Fara-7B achieves state-of-the-art performance within its size class and is competitive with larger, more resource-intensive agentic systems. It operates without needing separate models to parse the screen or additional information like accessibility trees, using the same modalities as humans to interact with computers.
This model takes three inputs: a user goal in text, the current screenshot, and a history of actions and thoughts. Its output includes a "thinking" block and a "tool call" block which dictates the next action. The tool call specifies actions like clicking, typing, scrolling, visiting a URL, web searching, or going back in history.
Training and Data Generation
Microsoft developed a novel synthetic data generation pipeline called FaraGen to train Fara-7B. This pipeline generates multi-step web tasks, drawing from real web pages and tasks sourced from human users. FaraGen uses a three-stage process involving task proposal, solving, and LLM-based verification on live websites across 70,000 domains. The system imitates human behavior, including retries, mistakes, scrolling, and searching. Each session is reviewed by three separate AI judges to ensure the steps make sense and the outputs match what’s visible on the page. After filtering, Microsoft retained 145,630 verified sessions containing over 1 million individual actions to train the model.
Performance and Benchmarks
Fara-7B exhibits strong performance across a diverse set of benchmarks. It has been evaluated on WebVoyager, Online-Mind2Web, DeepShop, and a newly introduced benchmark called WebTailBench, which focuses on real-world tasks like job postings and comparing prices across retailers. On these benchmarks, Fara-7B achieved 73.5% success on WebVoyager, 34.1% on Online-Mind2Web, 26.2% on DeepShop, and 38.4% on WebTailBench. Notably, Fara-7B outperformed models like UI-TARS-1.5-7B and even larger models like GPT-4o on certain benchmarks. Microsoft estimates the cost of a full task with Fara-7B to be around 2.5 cents, compared to roughly 30 cents for larger-scale agents using GPT-4 or other reasoning models.
Availability and Responsible AI
Fara-7B is available on Microsoft Foundry and Hugging Face under an MIT license and is integrated with Magentic-UI, a research prototype from Microsoft Research AI Frontiers. A quantized and silicon-optimized version is also available for Copilot+ PCs powered by Windows 11. Microsoft has incorporated controls based on its Responsible AI Policy, with built-in mechanisms to identify and stop at critical points where user consent or data is required. The model is trained to refuse or halt tasks involving illegal activities, impersonation, financial, medical, or legal actions, harassment, hate speech, scraping, spam, erotic content, or misinformation. It also demonstrates a high refusal rate of 82% on certain tasks.
Potential Applications
Fara-7B is designed to automate everyday web tasks such as filling out forms, searching for information, booking travel, or managing accounts. It allows users to build and test agentic experiences beyond pure research. Microsoft recommends running Fara-7B in a sandboxed environment, monitoring its execution, and avoiding sensitive data or high-risk domains.


















