Complex, multi-stage tasks

Amazon Nova Act: AI agent for browser control presented

Amazon
Image source: MacroEcon /Shutterstock.com

On Monday, Amazon announced Nova Act, a universal AI agent for browser control for simple tasks. At the same time, the company is releasing the Nova Act SDK for developers.

Amazon is pushing ahead with its efforts in the field of agent-based artificial intelligence. The newly unveiled Nova Act is designed to make it possible to independently control a web browser and perform simple tasks. The technology comes from the company’s recently opened AGI lab in San Francisco and will also power key features of the upcoming Alexa+ update, a generative AI version of the voice assistant.

Ad

However, Amazon describes the version available immediately as a “research preview” – an indication that the technology is not yet fully developed. Developers can access the Nova Act SDK via a new website (nova.amazon.com), which also serves as a showcase for Amazon’s various Nova base models.

Vision: From simple tasks to complex workflows

“Our dream is for agents to perform wide-ranging, complex, multi-step tasks like organizing a wedding or handling complex IT tasks to increase business productivity.,” Amazon explains in its blog post. However, the company admits that multi-level agents with high-level goals currently still require constant human supervision.

To overcome this limitation, the Nova Act SDK allows developers to break down complex workflows into reliable atomic commands (e.g. search, pay, answer questions about screen content). Developers can add more detailed instructions to these commands, call APIs and even use direct browser manipulation through Playwright to further improve reliability – for example, when entering passwords.

Ad

Focus on reliability instead of mere benchmarks

Amazon emphasizes that Nova Act is focused on reliable building blocks that can be assembled into more complex workflows. While many agent benchmarks measure model performance on high-level tasks, where state-of-the-art models achieve only 30% to 60% accuracy when completing tasks in web browsers, Amazon has focused on achieving over 90% in internal evaluations on features where other models struggle – such as date selection, drop-down menus and pop-ups.

“Nova Act’s focus on reliability means that once you have things working, there’s no need to watch it perform each action,” the blog states. The headless mode makes it possible to turn the agent into an API that can be integrated into other products, or set it up to run asynchronously on any schedule. As an example, Amazon cites an agent that runs in the background and automatically orders a salad for dinner every Tuesday.

Introducing Amazon Nova Act

Amazon claims that Nova Act performs better in internal tests than comparable agents from OpenAI and Anthropic. In the ScreenSpot Web Text test, which measures the text interaction of an AI agent, Nova Act scored 94%, while OpenAI’s CUA scored 88% and Anthropic’s Claude 3.7 Sonnet scored 90%.

First fruits of the AGI laboratory

Nova Act is the first public product from Amazon’s AGI lab, which is led by former OpenAI researchers David Luan and Pieter Abbeel. Both previously founded their own startups. Luan founded Adept, while Abbeel co-founded Covariant, before Amazon recruited them last year to drive its AI agent efforts.

Ad

Weitere Artikel