Strix: The AI Agent That Hacks Your App Before Real Hackers Do

You’re shipping code faster than ever. Claude Code, Cursor, Copilot — AI is writing an increasingly large portion of your code every week. That’s great for speed. Not so much for security.

AI-generated code has a known problem: it’s fluent, confident, and sometimes dangerously wrong. SQL injection patterns, broken authentication flows, exposed endpoints — the kind of things that would have been caught in a serious code review, except your review process isn’t keeping pace with your deployment cadence.

Strix is how you close that gap.

What is Strix?

Strix is an open-source agentic security platform with over 21K stars on GitHub. It doesn’t scan your code statically and hand you a list of warnings — it runs your application, acts like a real attacker, and validates findings through actual proof-of-concept exploits.

Think of it as a penetration tester running automatically on every pull request and never complaining about weekend work.

It supports any LLM provider: OpenAI, Anthropic, Vertex AI, Bedrock, Azure, or a local model via Ollama. You bring your API key; Strix brings the attack playbook.

The workflow

Strix operates in three modes depending on what you’re testing:

  • Local codebase — point it at a directory, read your source code, and hunt for vulnerabilities before you even deploy
  • GitHub repository — pass it a URL, it clones and analyzes
  • Live web app — black-box evaluation against a running URL, with or without credentials
# Installation
pipx install strix-agent

# Configure your LLM (example with Anthropic)
export STRIX_LLM="anthropic/claude-opus-4-6"
export LLM_API_KEY="your-anthropic-key"

# Run against your local project
strix --target ./my-app

# Or against your staging environment
strix --target https://staging.my-app.com \
  --instruction "Focus on authentication and privilege escalation"

On the first run it downloads a sandboxed Docker image to execute exploits safely. Results land in strix_runs/<run-name> — structured reports with description, impact, evidence, and remediation steps.

The part that actually changes your workflow: CI/CD integration

This is where Strix earns its place. A GitHub Actions workflow, and every PR triggers a security scan before merge:

name: strix-penetration-test
on: pull_request:

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - name: Install Strix
        run: curl -sSL https://strix.ai/install | bash
      - name: Run Strix
        env:
          STRIX_LLM: ${{ secrets.STRIX_LLM }}
          LLM_API_KEY: ${{ secrets.LLM_API_KEY }}
        run: strix -n -t ./ --scan-mode quick

The -n flag runs in non-interactive mode (perfect for CI). --scan-mode quick does a fast, high-impact sweep — not exhaustive, but enough to catch critical issues before merge. Strix exits with a non-zero code when it finds vulnerabilities, which means your pipeline fails and the PR stays blocked. You don’t need any code changes beyond the YAML.

For deeper evaluations — before a release, for example — switch quick to thorough and run it manually or on a schedule.

What it finds

Strix includes a library of security skills covering the OWASP Top 10 and more: SQL injection, RCE, broken access control, SSRF, authentication bypasses, and others. Version v0.6.0 added chained reasoning through multi-step investigations — agents preserve their thinking across steps, so they can follow exploit chains the way a real attacker would, not just check individual patterns in isolation.

LLM-based deduplication means you don’t get 40 variations of the same finding. Reports map cleanly to how a real pentest report reads: description, impact, evidence, remediation — ready to convert into a ticket or compliance document.

The context: why it matters now

The security community has been warning about AI-generated code for over a year. The concern isn’t that AI writes bad code on purpose — it’s that it writes plausibly bad code at scale. A misinterpreted requirement, a missing validation, a trust assumption that shouldn’t be there.

Strix doesn’t replace security review, but it gives any development team — even those without a dedicated AppSec engineer — an automated attacker running on every change. That’s a real shift.

Getting started


Do you already have a security tool integrated into your pipeline? Or is security still something you review “when there’s time”? Let us know in the comments.