Anthropic has introduced a significant update to its Claude AI assistant, enabling it to control a computer, with the new feature, ‘computer use,’ now available in public beta. This functionality allows Claude to simulate human interactions with a computer by “looking” at the screen, moving the cursor, clicking buttons, and typing text.
The computer use feature is available through the API, allowing developers to direct Claude to execute tasks that typically require human input, such as those demonstrated on a Mac in promotional videos.
This development places Anthropic ahead of other AI companies, like Microsoft, OpenAI, and Google, whose AI models—such as Microsoft’s Copilot Vision and OpenAI’s desktop app for ChatGPT—can observe a computer’s screen but haven’t yet moved to the stage of widely available tools capable of actively performing tasks like clicking or typing. Similarly, Rabbit’s R1 AI, which promised comparable abilities, has yet to deliver on its potential.
Limitations and Early Stage Development
Despite its potential, Anthropic has cautioned that the computer use feature is still in an experimental phase, calling it “cumbersome and error-prone.” The company notes that Claude’s ability to control a computer is limited by its method of viewing the screen: it takes a series of screenshots rather than observing a continuous video feed, which may cause it to miss brief actions or notifications. Anthropic released this feature early to gather feedback from developers, expecting rapid improvement as more data is collected.
Anthropic also detailed several limitations in Claude’s current capabilities. For example, it is not yet able to perform certain routine actions like dragging or zooming. Additionally, safeguards have been built into the system to prevent Claude from engaging with sensitive tasks, such as generating and posting social media content, registering web domains, interacting with government websites, or participating in election-related activities.
Advanced Model Performance in Benchmarks
The computer use functionality was launched alongside an upgraded version of Claude’s AI model, the Claude 3.5 Sonnet, which has shown marked improvements in several industry benchmarks. This includes notable gains in coding tasks.
For instance, Claude 3.5 Sonnet improved performance on the SWE-bench Verified coding benchmark, raising its score from 33.4% to 49.0%, outperforming models like OpenAI o1-preview. It also performed better on TAU-bench, an agentic tool use benchmark, increasing scores from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. These improvements have been achieved without an increase in price or a decrease in processing speed for customers.
Anthropic’s latest developments also come on the heels of its agreement, alongside OpenAI, to cooperate with the US AI Safety Institute in sharing information about its work, which was announced in August.