Microsoft’s Fara-7B: Small AI Model That Controls PCs Directly

Microsoft released Fara-7B, its first agentic small language model built for computer use. This 7 billion parameter model runs on your PC, interprets screenshots, and performs actions like clicking and typing without cloud servers. The Indian Express covered the launch, noting it extends the Phi series. Coverage also from The Decoder, The Hindu, and AI Business.

What Fara-7B Does

Fara-7B acts as a Computer Use Agent (CUA). It looks at a screenshot of your screen, then decides on mouse moves, scrolls, clicks, or keystrokes. No need for extra models to parse the screen or accessibility data. It works like a human eyeing the interface. InfoWorld explains it handles multi-step tasks entirely on-device, keeping data local.

How It Works

The model processes around 124,000 input tokens and 1,100 output tokens per task, per Microsoft’s numbers reported by The Indian Express. Microsoft trained it using FaraGen, a synthetic data pipeline from real websites across 70,000 domains. This generated 145,630 verified sessions with over 1 million actions, including human-like retries and errors. Three AI judges checked each step for accuracy.

Takes pixel-level view of screens
Predicts exact coordinates for actions
Runs on everyday PCs, no massive hardware

Performance Numbers

In benchmarks, Fara-7B scored 73.5% on Web Voyager, 34.1% on OnlineMind 2 Webb, 26.2% on DeepShop, and 38.4% on WebTailBench, which tests real tasks like job applications and real estate searches. Microsoft says it beats GPT-4o as a CUA on WebVoyager and finishes tasks in fewer steps than other 7B models. IT Pro notes these results hold across diverse web tasks.

Why It Matters

Fara-7B cuts costs. A full task runs about 2.5 cents versus 30 cents for GPT-4 or O3 agents. It lowers latency, boosts privacy by staying local, and works on laptops for enterprise workflows like moving data between apps. Analysts like Pareekh Jain point out it fixes cloud AI issues: high compute bills, data leaks, and delays. Charlie Dai from Forrester sees it pushing edge AI for sensitive automation. Plus, a “Critical Points” feature pauses for user approval on risky actions like emails or payments, with an 82% refusal rate on unsafe requests.

Where to Get It

Available now on Microsoft Foundry and Hugging Face under MIT license. It integrates with Magentic-UI and has a quantized version for Copilot+ PCs on Windows 11. Microsoft calls it experimental for community testing on web tasks like form filling or bookings. Run it sandboxed.