For much of 2024 and early 2025, the conversation about AI agents was dominated by capabilities: better models, more tools, longer context, more autonomous workflows.
Now the conversation is changing.
The question is no longer just:
“What can the agent do?”
The question that platform and security teams are starting to ask is much more important:
“What happens when the agent does something it shouldn’t?”
That shift in focus is driving a new operational architecture that’s appearing increasingly in enterprise tools, open source frameworks, and security documentation:
containment-first agents
The idea is simple:
assume from the design that the agent will eventually make mistakes, go off track, hallucinate, or abuse permissions.
And build the system around that reality.
The problem: agents are no longer chatbots
A traditional chatbot produces text.
A modern agent:
- executes commands
- modifies files
- calls APIs
- interacts with infrastructure
- uses credentials
- deploys code
- runs tests
- operates browsers
- makes multi-step decisions
That makes it an operational entity.
And operational entities need boundaries.
Most recent incidents around agents have the same underlying pattern:
- overly broad permissions
- insufficient isolation
- unrestricted tools
- contaminated context
- unaudited execution
The problem isn’t necessarily that the model is “bad”.
The problem is that we’re connecting probabilistic systems to real execution surfaces.
Why “prompt engineering” isn’t enough
For a long time, many organizations treated agent security as a prompting problem:
DO NOT MODIFY PRODUCTION
DO NOT DELETE FILES
DO NOT EXECUTE DANGEROUS COMMANDS
That works until it doesn’t.
Because prompts aren’t security boundaries.
They’re probabilistic instructions.
And when:
- context changes
- a tool responds differently
- the agent enters loops
- prompt injection appears
- reasoning drifts
…soft instructions stop being enough.
The industry is starting to accept an uncomfortable reality:
agents need real technical restrictions, not just textual rules.
What “containment-first” means
Containment-first means designing the operational environment assuming the agent can behave incorrectly.
That completely changes the architecture.
Instead of:
Model → Tools → Production
the pattern becomes something more like:
Model
↓
Policy Layer
↓
Sandbox / Isolated Runtime
↓
Limited Tools
↓
Human Approval (when applicable)
↓
Real Systems
Security stops depending on “hoping the agent understands”.
It starts depending on:
- isolation
- least privilege
- observability
- recovery
- execution limits
Exactly like in traditional distributed infrastructure.
The most important pattern: least privilege
Most current agents have too much access.
Common examples:
- full filesystem access
- complete git repo access
- reused cloud tokens
- shared credentials
- unrestricted web browsing
- unsandboxed shells
Containment-first inverts this logic.
The agent receives:
- only necessary tools
- only necessary directories
- only necessary permissions
- only for necessary duration
No different from how we already operate:
- containers
- IAM roles
- Kubernetes service accounts
- temporary credentials
The difference is that the entity consuming those permissions is now a probabilistic system.
Sandboxes: the new standard runtime
One of the most visible changes in 2026 is the accelerated adoption of isolated environments for agents.
More and more tools:
- execute code inside ephemeral containers
- isolate filesystem
- block networking by default
- separate credentials
- limit persistence
Because an agent that:
- generates code
- executes it
- observes results
- iterates automatically
…without a sandbox, is basically AI-assisted RCE.
And that completely changes the risk model.
The prompt injection problem changes everything
Agentic systems introduce a new attack surface:
indirect prompt injection
Example:
- agent navigates a page
- page contains hidden instructions
- model interprets them as valid context
- agent executes unintended actions
This isn’t theoretical.
It’s exactly the kind of attack that naturally emerges when:
- models consume external content
- tools have execution capability
- context isn’t isolated
Containment-first assumes it will eventually happen.
That’s why:
- sensitive tools require approval
- external outputs are sanitized
- critical capabilities are separated
- agents don’t get unrestricted access
Observability: the missing piece
Another strong pattern:
execution tracing
Teams want to know:
- what the agent decided
- what tool it used
- what context it saw
- what outputs it produced
- what commands it executed
- what failed
- why it took an action
This is driving:
- structured logs
- tool tracing
- session replay
- context snapshots
- decision audit
In other words:
AI observability starts to look a lot like distributed systems observability.
Agents are already being designed as distributed jobs
OpenAI, Anthropic, LangGraph, OpenHands and other ecosystems are converging on similar patterns:
- retries
- checkpoints
- state persistence
- lifecycle APIs
- resumable execution
- workflow durability
That’s no accident.
The industry is discovering that agents aren’t “chat UX”.
They’re operational runtimes.
And operational runtimes need:
- control
- recovery
- limits
- isolation
The most common mistake today
Many teams still deploy agents like this:
LLM + tools + production access
without:
- sandboxes
- permission limits
- policy layer
- human approval
- serious observability
That works in demos.
It doesn’t necessarily survive contact with production.
What the most mature teams are doing
The patterns appearing most often today:
1. Tool gating
The agent can:
- read files
but not:
- write
- execute
- deploy
without explicit approval.
2. Ephemeral sandboxes
Each session:
- new container
- isolated filesystem
- limited networking
- automatic cleanup
3. Temporary credentials
Nothing persistent.
Tokens:
- scoped
- rotated
- revocable
4. Runtime policies
Explicit layers of rules:
- what tools exist
- when they can execute
- what commands are forbidden
- what paths are valid
5. Human-in-the-loop
Sensitive actions:
- require approval
- are audited
- can be reverted
The LATAM dimension
For teams in Latin America, containment-first has an important advantage:
it reduces operational risk without requiring giant organizations.
Many regional teams:
- operate lean
- have less margin for incidents
- work with shared infrastructure
- manage tight cloud budgets
In that context:
- an agent with excessive permissions
- a runaway loop
- an incorrect deployment
- an accidental leak
…can have disproportionate impact.
Containment-first allows adopting AI automation without assuming levels of risk that are operationally hard to absorb.
The cultural shift
There’s an interesting cultural transition happening.
For years:
- more autonomy
- less friction
- more access
- more tools
were seen as natural progress for agents.
Now a different philosophy is starting to emerge:
the best agents aren’t the freest ones.
They’re the most governable ones.
What teams should do todayBefore deploying agents in real workflows:
Review permissions
- What can the agent actually touch?
- What happens if it makes a mistake?
Isolate execution
- Containers
- Ephemeral VMs
- Temporary filesystems
Add observability
- logs
- tracing
- replay
- audit
Introduce policy layers
- explicit limits
- technical rules
- not just prompts
Design rollback
- fast rollback
- checkpoints
- recovery workflows
Verdict
Containment-first agents will probably become the dominant security pattern for operational AI.
Not because models are getting worse.
But because agents are shifting from being passive assistants to operating real systems.
And any system that:
- executes actions
- uses tools
- modifies infrastructure
- makes autonomous decisions
…eventually needs the same properties we demand from the rest of our infrastructure:
- isolation
- observability
- control
- recovery
- governance
The industry is finally starting to treat agents for what they really are:
