Over the last eighteen months we’ve talked a lot about models, context, memory, MCPs, workflows, and agents. But there’s a much simpler question that almost always stayed hidden beneath the conversation:
Where does the agent actually run?
Because one thing is asking a model to suggest code. It’s quite another to give it permission to execute commands, modify files, run tests, open pull requests, or interact with infrastructure.
The difference between the two is exactly the difference between an assistant and an operator.
And that’s where GitHub’s biggest announcement this week comes in: local and cloud sandboxes for Copilot.
It’s not a glamorous feature. It doesn’t improve benchmarks. It doesn’t add millions of context tokens.
But it’s probably one of the most important pieces that appeared in the agentic ecosystem this year.
The problem was never the model
Models are already good enough for many engineering tasks.
They can:
- Write code
- Refactor modules
- Generate tests
- Review PRs
- Investigate bugs
- Propose architectures
The bottleneck stopped being intellectual capacity.
The problem became operational.
Because when an agent needs to work on a real repository, uncomfortable questions appear:
- Can it access the entire machine?
- Can it execute any command?
- Does it have access to secrets?
- Can it open external connections?
- Can it modify files outside the project?
- What if it gets stuck in a loop?
- What if it executes something destructive?
Most organizations solved this in a pretty basic way:
By not letting the agent do too much.
And that severely limits the value it can generate.
The agent paradox
We all want more autonomous agents.
But nobody wants to give them unrestricted access.
The more power an agent has:
- The more useful it is.
- The more risk it introduces.
And that tension gets worse as models improve.
An agent capable of modifying a hundred files is also an agent capable of breaking a hundred files.
An agent capable of deploying an application is also an agent capable of deploying something incorrect.
The industry had been hitting this limit for months.
We didn’t need smarter models.
We needed safer environments.
What a sandbox actually solves
The word “sandbox” sounds boring.
But it’s one of the most powerful ideas in all software engineering.
The premise is simple:
The agent can do real things, but within defined limits.
Instead of running directly on your laptop or on critical infrastructure, the agent works within an isolated environment.
That environment can control:
- File system
- Network access
- Environment variables
- Credentials
- Available tools
- Execution time
- Resource consumption
In other words:
the sandbox separates autonomy from privileges.
And that changes the equation completely.
The jump from copilot to operator
Until now, most AI workflows worked like this:
- The agent proposes.
- The human reviews.
- The human executes.
With a well-designed sandbox, another model appears:
- The agent proposes.
- The agent executes.
- The human validates results.
It seems like a small difference.
It isn’t.
It’s the step that converts AI from an assistive tool to an operative tool.
Why this matters for teams
Most teams don’t have problems generating code.
They have problems executing repetitive work.
Dependency audits
An agent could:
- Create an isolated environment
- Update packages
- Run the full suite
- Detect regressions
- Generate a PR
All without touching your machine.
Coverage expansion
An agent could:
- Analyze areas without tests
- Generate incremental coverage
- Run validations
- Open changes ready for review
Without access to secrets or production systems.
Large refactors
Refactors are perfect candidates for agents:
- Lots of mechanical work
- Little creative value
- High volume of changes
But they’re also dangerous.
A sandbox lets you run that work in a disposable environment and validate the result before merging.
The real parallel with Claude Code
What’s interesting is that GitHub isn’t inventing a new problem.
It’s arriving at the same place other agentic tools were already pushing toward.
When Anthropic introduced:
- Isolated worktrees
- Subagents
- Agent View
- Dynamic Workflows
the goal wasn’t just coordination.
It was to allow multiple agents to work without stepping on each other.
The difference is that GitHub is bringing that idea closer to the enterprise operating model:
- Sandboxes
- Policies
- Governance
- Centralized control
Two different approaches to solving the same problem:
how to let agents work without losing control.
What comes next
I think we’re entering a new stage of the market.
The conversation during 2024 and much of 2025 was:
Which model is better?
The conversation in 2026 is starting to be different:
Which runtime is better?
Because once models reach a certain level of capability, the differential is no longer just in intelligence.
It’s in:
- Observability
- Isolation
- Governance
- Memory
- Tools
- Orchestration
- Security
In other words:
infrastructure for agents.
And sandboxes are a central piece of that infrastructure.
The takeaway
The story from this week isn’t that GitHub added a new feature to Copilot.
The story is that the industry is finally starting to build the mechanisms that make it possible to trust autonomous agents.
Models have already proven they can generate code.
What we’re still building are the systems that let them act.
The next winners probably won’t be those with the smartest model.
They’ll be those who build the best environment for that model to work safely.
And in that race, sandboxes are far more important than their boring name suggests.
