A new study finally gives us a look into how AI agents perform at real work

FROM THE FRONTIER

A new study finally gives us a look into how AI agents perform at real work


Made with Midjourney

Can you really use AI agents to be more efficient at work? Here’s what a new study says.

Upwork’s Human+Productivity Index (HAPI). Upwork’s HAPI is one of the first data-driven evaluations of AI agent performance on real client work. The study tested AI agents from three frontier models (Claude Sonnet 4, Gemini 2.5 Pro, and OpenAI GPT-5) across 300 Upwork projects, specifically selecting simple tasks AI agents could reasonably handle.

Where AI agents do (and don’t) excel. Results from the test showed that AI agents perform best at tasks with objectively correct answers, like math or basic coding. But qualitative work like designing landing pages or crafting marketing copy? That’s where agents struggled — at least without human guidance.
For improved results, add human experts. AI agents performed much better when human experts were added to the mix. On average, job completion rates jumped 70% when humans and agents collaborated versus agents working alone. This pattern was consistent across different styles of work.

Don’t quit on agents yet. Does this mean learning to use AI agents is a waste of time? Not at all. Upwork CTO Andrew Rabinovich points out that “Where a project might take a freelancer days to complete independently, the agent-plus-human approach can deliver results in hours through iterative cycles of automated work and expert refinement.”
Upwork’s results show that human expertise directing AI horsepower can help compress days’ worth of work into hours.

via Superhuman