Vyrox Security

AI security copilot for teams without a dedicated SOC. Ingests EDR alerts, triages them through a deterministic heuristics engine and an LLM fallback, and routes the verdicts that matter to a human approver in Discord. Every containment action runs through a small Rust proxy that the customer can read and audit.

License: MIT (proxy) Status Audit


What this repository is

vyrox-docs is the public engineering documentation for the Vyrox Security platform. It carries the architecture, the API contracts, the threat model, the audit-log specification, and the contributor guides. Sales copy, pricing, customer rosters, and SLA contract language live elsewhere.

If you found this repo looking for the source code that touches your endpoints, you want vyrox-proxy. That is the Rust binary that receives signed containment instructions from the rest of the platform and calls the EDR vendor's API. It is MIT licensed and small enough to read in an afternoon.

What Vyrox actually does

Pipeline in five steps:

  1. Your EDR posts alerts to a Vyrox webhook over HTTPS. Each payload is authenticated per tenant with HMAC-SHA256 or a vendor-specific bearer token. CrowdStrike, SentinelOne, Microsoft Defender, and a customer field-mapped generic adapter are all supported today.
  2. Ingestion verifies the signature, normalises the vendor payload into a single NormalizedAlert schema, and pushes it onto a per-tenant Redis queue.
  3. The worker pulls the alert and runs it through the heuristics engine (deterministic regex-and-weight pattern matching with Noisy OR aggregation). The result is one of CRITICAL, HIGH, MEDIUM, LOW, BENIGN plus a confidence score.
  4. Anything in the ambiguous confidence band goes to an LLM with a strict JSON schema response. The LLM never executes anything. It only writes verdict fields. A Pydantic validator catches malformed responses and falls back to a conservative MEDIUM verdict at 0.5 confidence.
  5. CRITICAL and HIGH verdicts land in the tenant's Discord channel as an embed with Approve, Deny, and Investigate buttons. Approve generates an ActionRequest, signs it, and sends it to the Rust proxy. The proxy verifies the signature, checks a thirty-second replay window, dedupes on request ID, writes an audit entry, then either dry-runs or calls the EDR vendor's API.

Six rules hold across the whole pipeline. They are documented in ARCHITECTURE.md and enforced by tests. The shortest version:

  • Every database query carries tenant_id.
  • Every state change writes an audit entry before the response goes back.
  • HMAC verification happens before any payload is parsed.
  • The LLM cannot trigger containment. Only a human button click can.
  • Local development sets DRY_RUN=true by default so the proxy refuses to call real EDR APIs.
  • LLM JSON output is never passed to exec, eval, subprocess, SQL, or file operations. Only to Pydantic-validated verdict fields.

What is public, what is not

Open-core. The execution surface that touches customer infrastructure is open. The detection intelligence and the operational configuration is not.

ComponentRepoVisibilityWhy
Rust containment proxyvyrox-proxyPublic, MITCustomers should be able to read the code that isolates their hosts.
Engineering docsvyrox-docs (this repo)PublicThreat model, API contracts, contributor guides.
Alert simulatorvyrox-simulatorPublic, MITLets anyone replay a signed alert against a local stack.
Core monorepovyroxPrivateIngestion, worker, Discord bot. The pipeline shape is documented here; the implementation is not.
Heuristics enginevyrox-heuristicsPrivatePattern weights, MITRE technique mapping, false-positive baselines. The detection moat.
Adversarial playbookvyrox-adversarial-playbookPrivateRed-team TTPs we test against.
Infrastructurevyrox-deployPrivateProvider-specific configs and secrets.
Partner CRMvyrox-design-partnersPrivateGTM, contracts, prospect roster.

If you want to contribute, you can do it against vyrox-proxy, vyrox-simulator, or this docs repo without ever touching the private side. The contribution guide is in CONTRIBUTING.md.

Documents in this repo

Read in this order if you are new:

  1. QUICKSTART.md walks you from git clone to a signed alert hitting a local proxy. About ten minutes, no production credentials required.
  2. ARCHITECTURE.md is the system reference. Pipeline stages, multi-tenancy, audit chain, the six critical rules, the container boundary diagram, the decisions behind each component.
  3. THREAT_MODEL.md lists the assets, the threats, the mitigations, and the things explicitly out of scope. If you are evaluating Vyrox for a regulated workload, start here.
  4. API_REFERENCE.md documents every public endpoint: the four ingestion webhooks, the proxy's /execute and /audit/export, request and response shapes, error codes, signing rules.
  5. AUDIT_CHAIN.md is the wire spec for the SHA-256 hash-chained audit log. Independent verifiers can reproduce the chain from the JSONL stream alone.
  6. ADAPTERS.md is for contributors adding a new EDR vendor. Four rules to follow, one factory method to write, one test file to copy.
  7. SECURITY.md is the disclosure policy. Email address, PGP key, scope, SLA on triage, what we do not call a vulnerability.
  8. ROADMAP.md is the public roadmap by capability. No revenue targets, no customer counts.
  9. CONTRIBUTING.md and CODE_OF_CONDUCT.md cover how to send a patch and what behaviour is expected.

Status

Alpha. The pipeline is wired end to end and runs against synthetic alerts in CI on every push. Ten pilot integrations are the next milestone. The two recent audits in todo.md (a private file) drove the P0 fixes and the P0.5 follow-ups already merged. Test counts at the moment of writing this README: 89 Python tests, 17 Rust tests, lints clean across the workspace.

What "alpha" means in practice:

  • The on-disk audit format is stable. Field names will not change without a documented migration. AUDIT_CHAIN.md is the contract.
  • The HMAC signing format is stable. Python sign returns sha256=<hex> and the Rust proxy strips the prefix before constant-time-comparing.
  • The ingestion webhook URL shape is stable. The four routes documented in API_REFERENCE.md are the ones we will keep.
  • Anything else can move. Internal data models, the LLM provider, the worker concurrency model. We will note breaking changes in the CHANGELOG once a release tagging discipline lands.

Security contact

sec.vyrox@proton.me, PGP key at vyrox.dev/.well-known/pgp-key.txt. Acknowledgement within forty-eight hours. Full policy in SECURITY.md. Please do not file vulnerabilities as public GitHub issues.

License

vyrox-proxy and vyrox-simulator are MIT licensed.

vyrox-docs, vyrox-landing, vyrox-heuristics, vyrox-deploy, vyrox-design-partners, and the vyrox monorepo are proprietary.


Vyrox Security, Inc. — hello@vyrox.dev

Quickstart

This walks an OSS contributor from git clone to a signed alert hitting a local proxy. About ten minutes. No customer-side credentials. No EDR account. Nothing leaves your machine.

If you are an operator integrating a real EDR, see the design partner playbook — your company contact has the link. The public docs cover the open path only.

What you need

  • git
  • cargo (Rust 1.75+ recommended; whatever the proxy's Cargo.toml pins is fine)
  • bash, openssl, curl. Standard on macOS and most Linuxes.
  • About a hundred megabytes of disk for the Rust build cache.

You do not need Python, Node, Docker, or a Discord account.

Step 1: Clone the open components

Three repositories. Each clones into its own directory.

git clone https://github.com/vyrox-security/vyrox-proxy.git
git clone https://github.com/vyrox-security/vyrox-simulator.git
git clone https://github.com/vyrox-security/vyrox-docs.git

The docs repo is this one. The other two are MIT licensed.

Step 2: Build the proxy

cd vyrox-proxy
cargo build

First build pulls the dependency tree (about ninety crates). Future builds are quick. The final binary is at target/debug/vyrox-proxy.

Step 3: Run the proxy with DRY_RUN

The proxy refuses to start without a HMAC secret. Generate one for local use only; do not reuse it anywhere else.

export VYROX_HMAC_SECRET=$(openssl rand -hex 32)
export AUDIT_LOG_PATH=./local-audit
export DRY_RUN=true
export BIND_ADDR=127.0.0.1:3000

mkdir -p "$AUDIT_LOG_PATH"
./target/debug/vyrox-proxy

The proxy listens on 127.0.0.1:3000. DRY_RUN=true is the default so even if you forget to set it, the proxy will not call any EDR API.

Check it is alive in another shell:

curl -s http://127.0.0.1:3000/health
# {"status":"ok"}

Step 4: Fire a signed execution request

The proxy accepts POST /execute with an HMAC-SHA256 signed body. Smallest valid request:

SECRET="$VYROX_HMAC_SECRET"
TS=$(date +%s)
BODY=$(cat <<EOF
{"request_id":"$(uuidgen | tr A-Z a-z)","tenant_id":"local-test","alert_id":"alt-1","action_type":"HOST_ISOLATION","host":"workstation-01","approved_by":"local-test","approved_at":$TS}
EOF
)
SIG="sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/^.*= //')"

curl -s -X POST http://127.0.0.1:3000/execute \
  -H "Content-Type: application/json" \
  -H "X-Vyrox-Signature: $SIG" \
  --data-binary "$BODY"
# {"status":"dry_run","dry_run":true}

The proxy verifies your signature, writes an audit entry, then short-circuits because DRY_RUN=true. Look at the audit file:

ls local-audit/
# audit-2026-05-23.jsonl

cat local-audit/audit-*.jsonl

You will see one JSONL entry with dry_run: true, a hash, and a previous_hash of sixty four zeros (the genesis sentinel). The format spec is in AUDIT_CHAIN.md.

Step 5: Run the alert simulator

The simulator generates signed payloads for a Vyrox ingestion endpoint. There is no public Vyrox ingestion service to point it at, but you can replay against the simulator's own --dry-run mode to see what the wire format looks like:

cd ../vyrox-simulator

./simulate.sh mimikatz --dry-run
# Prints the signed payload to stdout.

If you have a private vyrox stack running (worker plus ingestion plus the bot), point the simulator at it:

VYROX_URL=http://localhost:8001/webhook \
  VYROX_HMAC_SECRET=$(cat ../vyrox/.env | grep CROWDSTRIKE_WEBHOOK_SECRET | cut -d= -f2) \
  ./simulate.sh mimikatz

For the open path, --dry-run is enough to see how an alert payload looks before it hits ingestion.

Step 6: Read the docs

You now have a running proxy and a signed-payload generator. The next thing to do depends on what you came for.

Troubleshooting

401 Unauthorized

The proxy rejected your signature. Two common causes:

  • The shell ate your \n somewhere and the body bytes you signed are not what you sent. Use --data-binary (not -d) on the curl command and quote the heredoc.
  • You signed with a different secret than the proxy is using. Re-run the export and the proxy boot in the same shell.

410 Gone

Your timestamp is outside the thirty second replay window. Refresh TS=$(date +%s) and regenerate the body and signature.

Proxy refuses to start

The proxy panics on boot if VYROX_HMAC_SECRET is unset. Set it before launch. The proxy also panics if you set one of TLS_CERT_PATH and TLS_KEY_PATH but not the other; either set both (for TLS) or neither (for plain HTTP behind a reverse proxy).

Audit file is empty

You probably hit 401 before any audit write. The proxy writes audit entries only after the HMAC check passes. If you see a request in the logs but no audit entry, that is the reason.

What is not in the open path

The full Vyrox stack contains four more processes: ingestion, worker, Discord bot, and the heuristics engine. Those live in private repositories. The pipeline shape is documented in ARCHITECTURE.md so a reader can understand the whole system; the implementations are not public.

A contributor adding a new EDR adapter does not need the private side. The adapter recipe in ADAPTERS.md covers what you write, the contracts you must respect, and the tests you must ship. A reviewer with private access merges your PR; you do not need the private code on disk.

Next steps

  • Read CONTRIBUTING.md for the patch workflow, test conventions, and reviewer expectations.
  • Read ARCHITECTURE.md for the system overview and the six critical rules.
  • Read AUDIT_CHAIN.md if you want to write a verifier or a compliance pipeline against the audit log.

Contributing

This document tells you what we will merge, what we will not, and what the review will check.

We accept contributions to three public repositories: vyrox-proxy, vyrox-simulator, and this docs repo. The private monorepo is not open to outside contributors today; once a public adapter or feature needs private-side wiring, a Vyrox maintainer takes it from there.

Before you start

If your patch is more than a hundred lines or changes a contract, open an issue or a draft PR first. Five minutes of "is this the shape you want" saves a week of "we cannot merge this because the wrong abstraction".

If you found a vulnerability, do not open a PR. See SECURITY.md for the disclosure path.

Workflow

Standard GitHub flow on every repo:

fork ─▶ feature branch ─▶ commits ─▶ PR against main ─▶ review ─▶ squash merge

Branch names: feat/<thing>, fix/<thing>, docs/<thing>, chore/<thing>, test/<thing>. The prefix matches the conventional commit type below.

Commit messages: Conventional Commits.

feat(adapters): add acme webhook adapter
fix(proxy): release nonce claim on audit-write failure
docs(threat-model): document A8 worker-to-bot HMAC

Multi-line bodies are welcome. Wrap at 72.

If your PR has more than one logical change, split it. Reviewers can hold one shape in their head at a time. Two unrelated changes in one PR usually means one gets merged and the other gets nitpicked forever.

What we will merge

  • Bug fixes with a regression test that would have caught the bug.
  • New EDR adapters following the contract in ADAPTERS.md, with the five required failure-mode tests.
  • Documentation corrections backed by source. Quote the file and the line you read.
  • Test coverage on existing code, especially around the six critical rules in ARCHITECTURE.md.
  • Performance improvements with benchmarks attached. We do not merge "should be faster" without numbers.
  • Refactors that reduce surface area. We do not merge refactors that add surface area; those start as design discussions.

What we will not merge

  • Marketing copy presented as architecture fact. "Best in class" belongs on a landing page, not in the docs.
  • Security guidance that weakens controls. If your PR makes the HMAC check optional, removes the replay window, or short-circuits the audit write, the answer is no, even if the test suite passes.
  • Auto-generated docs that drift from the code. The OpenAPI spec is not the source of truth; the route handlers are.
  • Code style PRs that touch hundreds of files. Run ruff or cargo fmt in your own branch; do not ship a workspace-wide reformat.
  • Adapter PRs that violate the four rules. See ADAPTERS.md for what each rule is, in concrete terms.
  • Changes that introduce a hard dependency on a paid SaaS provider without an open-source alternative documented as the default.

Testing

Every PR runs the full test suite in CI. Local commands:

For vyrox-proxy:

cargo test
cargo clippy -- -D warnings
cargo fmt --check

For vyrox-simulator:

./run-tests.sh        # if present, otherwise read scenarios/ and run a few
shellcheck simulate.sh scenarios/*.sh

For vyrox-docs:

markdownlint-cli2 "**/*.md"          # if installed

We do not ship a private-side test harness in the public docs. If your PR touches a contract documented here, write the test against the public surface. The reviewer will run the matching private-side test before merging.

Code style

  • Plain prose in docs. No em-dashes. No AI tells. Builder voice. Concrete file paths and function names where they help the reader.
  • Rust: cargo fmt defaults, cargo clippy -- -D warnings clean.
  • Python (private side): ruff defaults, mypy --strict clean, type hints on every public function.
  • Shell: bash with set -euo pipefail. POSIX-compatible flags where practical (the simulator runs on macOS and Linux).

Tests follow the production code style. A test that would not pass review for its prose does not get merged just because it is a test.

Reviewer expectations

A reviewer on a public PR will:

  • Read every line of the diff. We do not LGTM blocks of code we did not read.
  • Verify the test suite covers the failure mode the change was supposed to fix. A bug fix without a regression test is sent back.
  • Check the cross-references. If your PR changes a contract documented here, the docs change ships in the same PR.
  • Push back on scope. If the PR is doing two things, the reviewer asks you to split it.

A reviewer on a private-side PR (Vyrox staff only) does the same plus the rule-1-through-6 checklist from ARCHITECTURE.md.

Documentation discipline

Three rules.

  • Document what is, not what should be. If the code does X, the docs say X. Aspirational docs lead a new contributor to look at the code, find it disagrees, and lose trust in everything else.
  • Quote the file when you make a claim. "The proxy verifies HMAC in constant time (hmac::verify_signature in vyrox-proxy/src/hmac.rs:140)." A reader who wants to confirm has exactly one place to look.
  • Update the docs in the same PR as the code. Doc drift is a one-way ratchet. We close it on every PR or it grows forever.

Cross-references

Code of conduct

See CODE_OF_CONDUCT.md. Short version: be direct, be technical, be respectful. We do not have time for the opposite.

Code of Conduct

Professional Expectations

This project documents security software. Discussions should remain technical, respectful, and evidence-based.

Expected Behavior

  • Be precise and professional
  • Focus on code, docs, and decisions, not people
  • Provide reproducible references when making claims

Unacceptable Behavior

  • Harassment, abuse, or personal attacks
  • Intentionally misleading security advice
  • Posting secrets or sensitive tenant information
  • Spam and repeated low-effort noise

Enforcement

Maintainers may edit, lock, remove, or restrict participation that harms project quality or safety.

Security concerns: security@vyrox.security

Architecture

This document is the engineering reference for the Vyrox platform. It is written for the person who is about to read or modify the code, or who is evaluating Vyrox for a regulated workload and needs to know exactly what the system does. It does not describe what the product will become. It describes what runs in CI today.

If you are looking for setup steps, see QUICKSTART.md. If you are looking for the threat model, see THREAT_MODEL.md. If you want the on-disk audit format, see AUDIT_CHAIN.md.

Pipeline at a glance

   EDR vendors                                   Vyrox platform
                                                                                                   
  CrowdStrike Falcon  ─┐                                                                           
  SentinelOne          ├─▶  POST /webhook/{vendor}  ─▶  Ingestion (FastAPI)                        
  Defender Graph       │      HMAC or bearer auth          │                                       
  Generic JSON         ─┘      per-tenant secret           ▼                                       
                                                  NormalizedAlert  ─▶ Redis  LPUSH/RPOP            
                                                                          │   vyrox:alerts:{tid}   
                                                                          ▼                        
                                              Worker (asyncio)                                     
                                                                                                   
                                              1. Cache lookup     (24h TTL by alert fingerprint)   
                                              2. Heuristics       (Noisy OR, <5ms)                 
                                                                  ├─ confidence ≥ 0.75 ▶ accept  
                                                                  ├─ confidence ≤ 0.25 ▶ BENIGN  
                                                                  └─ otherwise ▶ LLM             
                                              3. LLM fallback     (primary + 2 fallback models)    
                                                                  + Pydantic schema validation     
                                                                  + per-tenant daily token budget   
                                              4. Persist          (SQLite, tenant-scoped tables)   
                                              5. Notify           (signed HTTP to bot)             
                                                                                                   
                                              Discord bot (FastAPI)                                
                                              ├─ /interactions  (Ed25519 verified)                 
                                              ├─ /webhook       (HMAC verified)                    
                                              └─ approval flow  ▶ Rust proxy                       
                                                                                                   
                                              Rust proxy                                           
                                              ├─ HMAC verify       (constant time)                 
                                              ├─ replay window     (±30s)                          
                                              ├─ nonce dedup       (DashMap, 10min retention)      
                                              ├─ audit append      (hash-chained JSONL)            
                                              └─ EDR API call      (or DRY_RUN short-circuit)      

All five services are independent processes. They communicate over HTTP and Redis only. There is no shared in-process state across services. The SQLite database is shared between the worker and the Discord bot in the current pilot deployment. A future Postgres migration is tracked in todo.md (private) before tenant count reaches twenty five.

Components

ComponentLanguageProcessWhat it owns
IngestionPython, FastAPIuvicorn ingestion.main:appWebhook auth, vendor payload normalisation, Redis enqueue
WorkerPython, asynciopython -m worker.mainTriage pipeline, persistence, Discord notification
Discord botPython, FastAPIuvicorn discord_bot.main:appInteraction handling, approval flow, signing toward the proxy
Containment proxyRust, Axumvyrox-proxyHMAC verify, replay window, nonce dedup, audit, EDR API call
Heuristics enginePythonimported by the workerPattern matching, Noisy OR aggregation

The heuristics engine is private. The shape of its API (HeuristicsEngine.score(alert: dict) -> HeuristicResult) is documented here because callers depend on it. The pattern weights and the MITRE technique mapping are not.

Critical rules

These six rules are enforced by tests and reviewed in every PR. Violating one is a blocking issue, not a stylistic choice.

Rule 1: Tenant isolation

Every database query carries a tenant_id filter. Every Redis key is namespaced vyrox:alerts:{tenant_id}. There is no shared bucket and no fallback tenant.

The previous default-tenant fallback was removed on 2026-05-21 after the first audit caught it. The replacement contract: if a payload arrives without the vendor's tenant identifier (customer_id, accountId, tenantId), the ingestion route returns HTTP 400 and the EDR retries. The function is resolve_tenant_id in ingestion/main.py. It raises MissingTenantIdentifier on a missing or empty value.

The schema invariant is checked at boot. shared/db.py:_assert_tenant_id_present walks every table in _TENANT_SCOPED_TABLES (alerts, actions, verdict_cache, token_usage) and refuses to start the service if any of them is missing the tenant_id column. The check uses PRAGMA table_info, runs once at startup, and raises SchemaIntegrityError loudly enough that the deploy fails.

Rule 2: Audit before response

Every state-changing operation writes an audit entry before the response goes back to the caller. The audit log is append-only JSONL. Each entry carries previous_hash (the SHA-256 of the prior entry) and hash (the SHA-256 of previous_hash || canonical_json(entry)). The first entry of the very first log file links to a sentinel genesis hash of sixty four zeros.

The chain survives process restarts. AuditWriter.__init__ in shared/audit.py reads the last hash from today's log file before accepting the first write. The Rust proxy uses the same approach in audit::ChainState::from_file. Both implementations agree on the wire format. The independent specification is in AUDIT_CHAIN.md.

Audit writes are durable. _sync_write in shared/audit.py flushes and os.fsync after every entry. The Rust side does flush followed by sync_data. A power cut between the write and the OS writeback does not lose entries.

Rule 3: HMAC before processing

Every webhook payload is verified before any parser touches its bytes. The verification uses hmac.compare_digest on the Python side and the subtle::ConstantTimeEq trait on the Rust side. Both run in time proportional to the MAC length, not to where the first byte mismatch appears.

The wire format on the Python side: sign(payload: str, secret: str) returns f"sha256={hex_digest}". The Rust verifier strips the "sha256=" prefix before comparing. The round-trip is locked by tests/test_p0_regressions.py::test_hmac_python_sign_uses_sha256_prefix.

For requests carrying JSON bodies that travel between Vyrox services (the worker calling the bot, the bot calling the proxy), the body is serialised with separators=(",", ":") and sort_keys=True. Without that pinning, Python's default json.dumps and Rust's serde_json disagree on whitespace and key order, which produces a different MAC on the verifier side and a silent 401.

Rule 4: No autonomous containment

The LLM cannot trigger a containment action. The heuristics engine cannot trigger a containment action. The worker cannot trigger a containment action.

The only code path that calls the Rust proxy is the Discord bot's approval handler in discord_bot/handlers/approvals.py, which runs in response to a Discord button click. The button click itself is authenticated end to end: Discord signs the interaction with Ed25519, the bot verifies the signature against the application's public key in discord_bot/security.py, the handler then signs an ActionRequest with the shared HMAC secret, and the proxy verifies that signature before doing anything else.

The static invariant is enforced by a test: tests/test_p0_regressions.py::test_worker_triage_never_invokes_proxy greps the worker modules at import time and at source level for any reference to discord_bot.proxy_client.execute_action. If the worker ever imports that symbol, the test fails. The check covers both eager imports and lazy imports inside functions.

Rule 5: DRY_RUN by default

The Rust proxy's dry_run flag is true by default. Production has to opt in to real execution by setting DRY_RUN=false in the environment. The check happens before the EDR client is even constructed, so mis-configuration cannot accidentally call the vendor's API.

#![allow(unused)]
fn main() {
// vyrox-proxy/src/main.rs
let response = if state.dry_run {
    info!(/* ... */, "DRY_RUN: skipping EDR call");
    ExecuteResponse { status: "dry_run".to_string(), dry_run: true }
} else {
    state.edr.dispatch(payload.action_type, &payload.host).await
};
}

The audit entry written on a DRY_RUN action looks identical to a real action except for the dry_run: true field. That is intentional. An operator looking at the audit log can tell the difference, and a compliance review on the JSONL stream sees the same chain integrity either way.

Rule 6: LLM output never directly executed

The LLM returns a JSON object with five fixed fields: verdict, confidence, reasoning, mitre_techniques, suggested_action. The triage_with_llm function in worker/llm.py runs the parsed object through _parse_triage_json which checks every field against a fixed allow-list (verdict in {CRITICAL, HIGH, MEDIUM, LOW, BENIGN}, confidence clamped to [0, 1], suggested_action in the action allow-list). A response that fails validation produces a conservative MEDIUM verdict at 0.5 confidence, not a partial commit.

The validated object never touches exec, eval, subprocess, the filesystem, or SQL. It only sets fields on a TriageResult. The Pydantic model itself is frozen so even a downstream caller cannot mutate fields after the fact.

Multi-tenancy

Tenant isolation is a property of the data layer, not a runtime check in business logic.

SurfaceHow tenants are separated
Redis queueKey namespace: vyrox:alerts:{tenant_id}
SQLite tablesEvery row carries tenant_id; queries filter on it
Discord channelsDiscordGuild.tenant_id maps Discord server to tenant
Webhook secretsLooked up per tenant in tenant_credentials.webhook_secret_encrypted
Audit logEach entry carries tenant_id; export endpoints filter server-side
Token budgetDaily ledger keyed on tenant_id and date
Verdict cacheCache key (tenant_id, fingerprint)

The webhook routes resolve the tenant from the vendor payload's own identifier field (customer_id, accountId, tenantId), then look up that tenant's secret in tenant_credentials before verifying the signature. A payload that authenticates with the wrong tenant's secret fails the HMAC check and returns 401. A payload with no identifier returns 400. There is no path where an unmatched payload lands on a shared queue.

Cross-tenant access from inside the Discord bot is blocked by discord_bot/main.py:312. The custom_id of every approval button embeds the alert's tenant ID. Before calling the approval handler, the bot checks that the alert tenant matches the Discord guild's tenant. A mismatch returns "This action is not valid for this server" without contacting the proxy.

Two-stage triage

Triage runs in worker/triage.py::triage. Five stages, three early returns.

                                                              ┌────────────────────────┐
       NormalizedAlert ──▶ verdict cache ──▶ cache hit ──▶ │ return cached verdict │
                                  │                            └────────────────────────┘
                                  │ cache miss
                                  ▼
                          heuristics engine
                                  │
        ┌─────────────────────────┼─────────────────────────┐
        ▼                          ▼                          ▼
  confidence ≥ 0.75         confidence ≤ 0.25       0.25 < confidence < 0.75
  accept heuristic         return BENIGN               LLM fallback
       verdict                                          │
                                                            ▼
                                                  token budget check
                                                            │
                                ┌───────────────────────────┼───────────────────────────┐
                                ▼                            ▼                            ▼
                          budget exhausted        primary model            primary 429/5xx
                          MEDIUM / 0.5            parse + return          ▼
                                                                              fallback model 1
                                                                              parse + return
                                                                                      │
                                                                                      ▼
                                                                              fallback model 2
                                                                              parse + return
                                                                                      │
                                                                                      ▼
                                                                              all rate limited
                                                                              MEDIUM / 0.5

The two-stage design solves three problems at once. Determinism and explainability for the eighty percent of alerts that are obvious. Low cost because the LLM is reserved for the ambiguous middle band. A conservative default verdict for any failure mode, so the queue never jams on a provider outage. The LLM provider is not named in this doc because the choice is operational. The model chain is configured in environment variables (LLM_PRIMARY_MODEL, LLM_FALLBACK_MODEL_1, LLM_FALLBACK_MODEL_2).

Approval flow

   Discord button click
            │
            ▼
   bot /interactions   ◀──── Ed25519 verify against settings.discord_public_key
            │
            ▼
   custom_id parse  ──▶ approve / deny / investigate
            │
            ▼  (approve only)
   AlertRecord lookup by alert_id + tenant_id
            │
            ▼
   Idempotency check  ──▶ if status already executed/executing/approved → no-op
            │
            ▼
   Mark alert "executing"
   Persist ActionRecord "approved"
   Audit "approve.requested"      ◀──── written before any outbound call
            │
            ▼
   proxy_client.execute_action()
   body signed with vyrox_hmac_secret (deterministic JSON)
            │
            ▼
   Rust proxy /execute
   ├─ HMAC verify
   ├─ replay window check (±30s)
   ├─ nonce.claim_or_replay(request_id)
   ├─ audit::append_audit  ◀──── written before EDR call
   └─ edr.dispatch (or DRY_RUN short-circuit)
            │
            ▼
   ActionRecord.status = "executed" or "dry_run"
   Alert.status        = "executed"
   Audit "approve.executed"

The flow's idempotency story has three layers. The bot checks the AlertRecord.status before generating a request ID, so a double-click returns "already approved". The proxy keeps a per-request-ID nonce store with ten minute retention, so a network retry replays the cached response instead of calling the EDR twice. The audit entry is written once per state transition; replayed requests do not double-log.

Configuration

All configuration is read at startup from environment variables through shared/config.py::Settings. The settings class uses pydantic_settings so a missing required field raises a ValidationError before the service serves traffic.

The full env contract is in .env.example in the private monorepo. The fields that an OSS contributor needs to know about:

VariableComponentPurpose
VYROX_HMAC_SECRETallSixty four hex characters. Signs Python ↔ Python and Python ↔ Rust traffic.
REDIS_URLingestion, workerredis:// or rediss:// URL. The legacy Upstash REST variables are still accepted for backward compatibility but new deployments should set this.
OPENCODE_ZEN_API_KEYworkerLLM provider key. Empty falls back to the legacy OPENROUTER_API_KEY during the migration window.
DISCORD_BOT_TOKENbotDiscord application token.
DISCORD_PUBLIC_KEYbotApplication public key for interaction Ed25519 verification. Empty skips verification (local dev only).
CROWDSTRIKE_WEBHOOK_SECRETingestionVendor-default HMAC secret. Per-tenant secrets stored in tenant_credentials override this.
SENTINELONE_WEBHOOK_SECRETingestionVendor-default bearer token.
DEFENDER_WEBHOOK_SECRETingestionDefender Graph clientState value used as bearer.
AUDIT_LOG_PATHall writersDirectory for daily JSONL files. The hash chain depends on this surviving restart.
VYROX_PROXY_URLbotBase URL of the Rust proxy.
DRY_RUNproxytrue by default. Production opts in to real EDR calls.

What is in the private side

Reading the public docs without seeing the private code is intentional. The boundary makes contribution clear.

The private monorepo holds the implementation of the pipeline above. File names mirror the layout described here (ingestion/, worker/, discord_bot/, shared/, playbook/, migrations/, tests/). The Python tests covering the public contracts have public-safe names (test_p0_regressions.py, test_p05_blockers.py). Anyone with access can map a private fix to a public contract in seconds.

The detection patterns, the LLM prompts, and the operational configs stay private. Those are the layer that creates the business; the proxy and the contracts are the layer that creates the trust. The split is deliberate.

Operating commitments

We do not publish hard SLA percentages in this repo. The reasons are honest. Numbers we cannot defend across all pilots today belong in negotiated contracts, not in OSS docs.

What we can commit publicly:

  • The audit log is customer-owned. We do not lose it, we do not modify it, and we provide export at any time. The format is the contract, not our retention policy.
  • Containment proceeds only after a human in Discord clicks Approve. There is no autonomous containment path.
  • Webhook authentication failures and proxy signature failures both return generic 401 responses. We never tell a caller which part of the credential was wrong.

Per-customer SLAs that involve uptime targets and triage latency live in signed contracts.

Decisions worth knowing

A short list, written for the reader who is asking "why this and not that".

Rust for the proxy. The proxy is the only Vyrox process that can cause customer-side side effects. The set of properties we wanted in one binary: memory safety without a garbage collector, a small static binary, a constant-time HMAC implementation in the ecosystem, no runtime dependency on a vendor library. The Rust choice gave us all of them. The proxy is intentionally small. About a thousand lines of code including tests, splitting across main, hmac, audit, nonce, edr, and actions.

SQLite for the pilot. SQLite with WAL mode and a single writer process handles the pilot scale (ten tenants, low hundreds of alerts per day per tenant). Write contention bites somewhere around twenty five tenants, which is the trigger for the Postgres migration. The schema is already SQLModel-compatible, so the migration is a SQL dump plus a connection string change, not a rewrite.

Discord as the operator UI. The first ten pilots use Discord exclusively. The bot handles onboarding, alert review, approval, and slash commands for stats and audit export. The cost is one extra infrastructure provider; the benefit is that a customer's first five-minute experience is "I added your bot to my server and a synthetic alert appeared." A web dashboard ships when a prospect refuses Discord or when customer count reaches eleven, whichever comes first.

Two-stage triage. A pure LLM design is slow, expensive at scale, and not auditable without careful prompt engineering. A pure rules design misses anything novel. The split lets us run the heuristics for free, run the LLM only on the ambiguous middle band, fall back to a conservative MEDIUM on any failure, and keep the LLM output strictly inside a Pydantic schema before it touches anything else.

Human in the loop for execution. Auto-isolating hosts on false positives is the kind of incident that loses you the customer. Until we have a year of per-tenant false-positive data, every CRITICAL and HIGH containment is gated on a human Approve click. LOW auto-approval is opt-in per tenant and logged identically to manual approvals.

Cross-references

Threat model

This document is the asset-by-asset, attacker-by-attacker view of the Vyrox platform. It is the document a regulated workload's security review will ask for, and it is the document that drives every test in tests/test_p0_regressions.py and tests/test_p05_blockers.py.

The format is STRIDE-aligned but pragmatic. We list each asset, the attackers we consider in scope for that asset, the threats they could plausibly carry out, the mitigations that defend against those threats, and the residual risks we have accepted.

Trust boundaries

   public internet                                   private network
                                                                              
   EDR vendors  ──── webhooks ────────▶  ingestion   ────▶  Redis             
                                              │                                
                                              └────────▶  SQLite              
                                                              ▲                
   Discord  ──── interactions ──────────▶  bot   ────────────┘                
                                              │                                
                                              └──── HMAC-signed ──▶  Rust proxy
                                                                          │    
                                                                          ▼    
                                                                    EDR vendor 
                                                                    APIs       

Boundaries that matter:

  1. EDR vendor → ingestion webhook. The vendor is honest, the network between them and us is not. Mitigations: HMAC-SHA256 or bearer token per route, per-tenant secrets stored in tenant_credentials.
  2. Discord → bot /interactions. Discord is honest, anyone with the bot URL is not. Mitigations: Ed25519 signature verification with the application public key.
  3. Worker → bot /webhook. The worker is honest, anyone on the same network as the bot is not. Mitigations: HMAC-SHA256 over deterministic-JSON body using the shared VYROX_HMAC_SECRET.
  4. Bot → Rust proxy. Same model. Mitigations: HMAC-SHA256 plus a thirty second replay window plus per-request-ID nonce dedup.
  5. Customer → bot slash commands. Customer-side users are not all equal. Mitigations: Discord-side RBAC via role IDs; the bot rejects approval clicks from users without the configured admin role.

Assets

A1: Customer audit log

The append-only JSONL audit log per tenant. Contains a record of every alert triaged, every Discord approval click, every proxy execution, every action result. The log is the authoritative incident-response artifact and the SOC 2 evidence sample.

ThreatMitigationResidual
Modify a past entry to hide an executed containmentSHA-256 hash chain over the full payload. Any single-byte change breaks the chain at the modified entry and every entry after it. Operators verify with the standalone script in AUDIT_CHAIN.md.An attacker who controls the host can truncate the log to a prior good entry. We detect truncation only on restart by comparing last_hash between processes. Tracked as "tamper detection on truncation" in the roadmap.
Read another tenant's audit logEvery audit entry carries tenant_id. The Rust proxy /audit/export endpoint filters server-side on tenant_id and requires an HMAC-signed timestamp header on the request.An operator with shell access on the proxy host can read everyone's log. Out of scope; treat shell access as a P0 incident.
Lose entries on power loss between write and OS flush_sync_write in shared/audit.py calls flush + os.fsync. The Rust side calls flush + sync_data. Both flush to physical storage before returning.A disk failure between fsync and the next read can still lose the entry. Mitigate at the filesystem layer (RAID, snapshots).

A2: HMAC shared secret (VYROX_HMAC_SECRET)

A thirty two byte secret encoded as sixty four hex characters. Signs every Python-to-Python and Python-to-Rust call.

ThreatMitigationResidual
Recover the secret from a timing channel during HMAC comparehmac.compare_digest in Python, subtle::ConstantTimeEq in Rust. Both run in time proportional to MAC length, not to where the first byte mismatches. Locked by tests in tests/test_crypto.py and vyrox-proxy/src/hmac.rs::tests.An attacker who can co-locate on the same CPU might measure cache timing in theory. Not feasible against an HMAC compare in practice.
Leak the secret in logs or error responsesSettings module never logs the secret. HMAC failures return a generic 401 with detail "invalid signature". The Rust proxy uses tracing with the secret field marked private.Misconfigured external log aggregator could capture an env dump. Mitigate at the deployment layer.
Use the same secret to forge a request after a key rotationA rotation invalidates all signed requests immediately. The bot regenerates the request ID on every retry, so any cached old-secret payload becomes useless after the rotation.Operator must coordinate rotation between the worker, bot, and proxy. Documented in the runbook.

A3: Per-tenant webhook secrets

Each onboarded tenant has its own webhook secret in tenant_credentials.webhook_secret_encrypted. The column is named "encrypted" but stores raw bytes during the pilot. Encryption at rest ships when the encryption module lands.

ThreatMitigationResidual
Tenant A spoofs a payload as tenant BThe route resolves the tenant from the payload first, then looks up that tenant's secret. The signature must match the per-tenant secret, not the global one. A wrong-tenant signature fails the HMAC compare.A misconfigured route that uses the global fallback secret for a tenant who should have their own is detectable in the audit log (the lookup logs at WARN). Tracked as I-8 in the roadmap.
Read another tenant's secret from the DBAll DB queries filter by tenant_id. There is no read path that returns all tenant_credentials rows. The schema preflight at startup refuses to start the service if the table is missing the tenant_id column.A direct SQL session has access to everything. Restrict shell access.

A4: Discord application token

DISCORD_BOT_TOKEN lets the bot post messages, fetch member rosters, and react to interactions. A compromised token lets an attacker delete the bot, post arbitrary messages, or impersonate the bot inside customer servers.

ThreatMitigationResidual
Token leaked via env dumpToken is not logged. Settings module masks the value in __repr__. Production deployments use a secret manager rather than .env files on disk.Misconfigured CI could leak the token in a build log. Use the secret-injection feature of the CI provider, not echo.
Attacker uses the token to forge an interaction replyOutbound calls to Discord use the token. Inbound interactions are verified against the application public key (Ed25519). A leaked token does not let an attacker forge interactions back to us, only push messages out as the bot.Cannot prevent an attacker from impersonating the bot in customer servers until we detect and rotate the token. Rotation runbook ships before customer #5.

A5: Containment proxy ability to call EDR vendors

The Rust proxy can call CrowdStrike, SentinelOne, or Defender APIs to isolate hosts, kill processes, and quarantine network access. The blast radius of a compromised proxy is the whole tenant fleet.

ThreatMitigationResidual
Forge an ActionRequest without the shared secretHMAC verification before any parse. Constant-time compare. Replay window of thirty seconds. Nonce dedup on request_id. All four together leave the attacker no path.Compromising the host running the proxy bypasses everything. Treat as a P0 host-level incident.
Trick the proxy into calling a wrong hostThe proxy treats the host field as opaque and passes it through to the EDR API. The EDR vendor checks that the host belongs to the calling tenant. A wrong host either fails the EDR API or affects the same tenant.An attacker with the right HMAC and the right tenant can isolate a host belonging to that tenant. They have already passed authentication; this is not a privilege escalation.
Re-execute a containment via the replay windowNonce store records every request_id for ten minutes. A replayed request with the same ID returns the cached response and never calls the EDR. A replayed request with a different ID fails the thirty second timestamp check.An attacker who controls the wire could ship a fresh request_id within thirty seconds, but they would also need to ship a valid signature. They cannot do that without the secret.
DRY_RUN=false in development by accidentDRY_RUN=true is the default. Production opts in. The bool parser accepts the common spellings (true, 1, yes, on) and warns on anything else.An operator who explicitly turns off DRY_RUN can still cause real EDR calls. Documented. The expected production setting is DRY_RUN=false plus a vendor API token; the absence of the token also short-circuits the call.

A6: LLM provider trust boundary

We send process command lines, hostnames, and user account names to a third-party LLM router. The router is a trust boundary we do not control. Some customers will require an opt-out.

ThreatMitigationResidual
LLM provider logs payloads and is compromisedTenants can opt out of LLM triage at the contract level. With LLM disabled, the worker returns MEDIUM/0.5 from worker.llm._conservative_fallback on the ambiguous middle band. Heuristics still handle the high and low confidence ends.Cannot prevent the provider from seeing data when LLM is enabled. Documented in the pilot agreement.
LLM prompt-injection attack from inside a malicious file pathThe prompt is a fixed template; vendor data only appears in the value slots. The response goes through Pydantic validation before any field is used. A response that does not match the schema falls back to MEDIUM/0.5 and writes a llm_call_parse_error audit event.A model that returns a perfectly-formed but wrong verdict still passes validation. Mitigate with heuristics overrides for known false-positive patterns.

A7: Discord interaction endpoint

The bot's /interactions route is publicly reachable on the internet because that is the contract with Discord. Anyone who finds the URL without the Ed25519 public key can attempt to forge interactions.

ThreatMitigationResidual
Forge an approve_<alert_id>_<tenant_id> button clickEvery interaction POST carries X-Signature-Ed25519 and X-Signature-Timestamp. The bot verifies the Ed25519 signature against settings.discord_public_key before any handler runs. A bad signature returns 401. Locked by tests/test_p05_blockers.py::test_discord_signature_*.The verifier is bypassed when DISCORD_PUBLIC_KEY is empty. Local dev only; production refuses to set up Discord without the key.
Replay a captured legitimate interactionDiscord publishes guidance against this and uses a short timestamp window. The Vyrox approval handler also checks AlertRecord.status and ignores clicks on alerts that are already executed, executing, or approved. A replayed click on an alert that already fired is a no-op.A replay of a click on a brand-new alert within a few seconds is theoretically possible if Discord's window is open. The cost is one duplicate audit entry of approve.duplicate_click_ignored, not a double execution.

A8: Worker /webhook from bot

The bot calls the worker's notification surface only when the worker calls the bot first, but the bot also receives worker notifications. The /webhook on the bot was unauthenticated until 2026-05-23 (Fix B in the audit). It is now HMAC-protected.

ThreatMitigationResidual
Anyone with the bot URL posts a fake alert embed into a tenant channelWorker signs the body with VYROX_HMAC_SECRET. Bot verifies before parsing. Locked by tests/test_discord_bot.py::test_webhook_post_rejects_unsigned.Compromise of VYROX_HMAC_SECRET lets an attacker do this; same blast radius as the proxy compromise above.

Out of scope

We do not consider the following in this threat model. They are real risks; they are out of scope because they live above or below our control surface.

  • A malicious EDR vendor sending fabricated alerts. We trust the EDR vendor as the source of truth on what events happened on the customer's hosts.
  • A malicious tenant with the correct credentials self-isolating their own hosts. They authenticated; that is the contract.
  • Physical attacks on the deployment host, the customer's endpoints, or our developer workstations.
  • Side-channels at the silicon level. Spectre-class attacks, rowhammer, power analysis. Out of scope for a security tool that runs on top of Linux on commodity cloud hardware.
  • Discord platform availability. We design for the platform being up; the backup notification path (email plus PagerDuty for CRITICAL alerts) handles platform outages.

What changed and when

The threat model is versioned implicitly through the commit history of this file. Material changes:

  • 2026-05-21: First end-to-end audit. Drove the original eight P0 blockers in todo.md (private), all of which shipped.
  • 2026-05-23: Second audit. Drove eight P0.5 blockers: Discord Ed25519, bot webhook HMAC, proxy audit-export auth, Rust audit chain, Python audit chain across boots, real /onboard flow, env example sync, Redis configuration. All shipped.

The audits themselves are private. The fixes are public and the tests that lock them are documented in ARCHITECTURE.md.

Audit chain specification

This document is the wire-level specification for the Vyrox audit log format. It is targeted at customers who want to verify their own log files independently, compliance teams reviewing SOC 2 evidence samples, and contributors writing new code that reads or writes audit entries.

The format is identical between the Python side (shared/audit.py in the private monorepo) and the Rust side (vyrox-proxy/src/audit.rs, public). The two implementations agree byte for byte. A single verifier program can read both streams.

File layout

One JSONL file per UTC day. File name: audit-YYYY-MM-DD.jsonl. Files are append-only on disk; the kernel honours the O_APPEND flag so concurrent writers cannot stomp each other.

A new file rolls over at the next UTC day. The hash chain continues across files. The first entry of a new day's file uses the hash of the last entry of the previous day's file as its previous_hash. The very first entry of the very first file uses the genesis sentinel hash (sixty four ASCII zeros).

audit-2026-05-22.jsonl
audit-2026-05-23.jsonl   <- previous_hash of entry 0 == hash of last entry in 2026-05-22 file
audit-2026-05-24.jsonl   <- chain continues

Entry shape

Every entry is a single JSON object on its own line. Field order on disk varies because we use serde_json::to_string (Rust) and json.dumps(..., sort_keys=True) (Python); verifiers must not depend on a specific order in the on-disk JSON. The hash computation, by contrast, is order-dependent and uses canonical JSON. See "Hash computation" below.

Rust proxy entries (containment actions)

{
  "timestamp": 1700000000,
  "tenant_id": "acme-corp",
  "action_type": "HOST_ISOLATION",
  "host": "workstation-01",
  "approved_by": "jane.smith#1234",
  "dry_run": false,
  "previous_hash": "0000000000000000000000000000000000000000000000000000000000000000",
  "hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
FieldTypeNotes
timestampintegerUnix epoch seconds, UTC. Capture time on the writer host.
tenant_idstringMulti-tenant scope. Required.
action_typestringOne of HOST_ISOLATION, PROCESS_KILL, NETWORK_QUARANTINE. Stored as Debug format of the Rust enum.
hoststringVendor-side host identifier. Opaque to the audit log.
approved_bystringDiscord username including discriminator.
dry_runbooltrue when DRY_RUN was active and no real EDR call was made.
previous_hashstring64 lowercase hex characters. Genesis sentinel for the first entry of the very first file.
hashstring64 lowercase hex characters. SHA-256 of `previous_hash

Python pipeline entries (everything else)

Python writes audit entries for ingestion events, triage decisions, notification attempts, Discord interactions, and any other state change. The wrapper shape is fixed; the inner entry dict is free-form per event.

{
  "timestamp": "2026-05-23T14:32:00+00:00",
  "entry": {
    "event": "triage_persisted",
    "alert_id": "alt_abc123",
    "tenant_id": "acme-corp",
    "verdict": "CRITICAL",
    "confidence": 0.92
  },
  "previous_hash": "0000000000000000000000000000000000000000000000000000000000000000",
  "hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
FieldTypeNotes
timestampstringISO 8601 UTC. Format produced by Python's datetime.now(timezone.utc).isoformat().
entryobjectFree-form event payload. Conventions are documented per event below.
previous_hashstringSame as Rust.
hashstringSame as Rust.

The Python and Rust streams interleave at the JSONL layer; they share a single chain. A verifier reads one stream of lines, ignores whether the inner shape is the Rust action format or the Python wrapped format, and computes the next expected hash from the on-disk previous_hash plus the rest of the entry.

Hash computation

The chain is a SHA-256 hash chain over canonical-JSON entries.

For Rust entries the canonical payload is the entry without the hash field. The order is alphabetical by key. Whitespace is absent. The canonical form for the example above is:

{"action_type":"HOST_ISOLATION","approved_by":"jane.smith#1234","dry_run":false,"host":"workstation-01","previous_hash":"0000...0000","tenant_id":"acme-corp","timestamp":1700000000}

The hash is:

hash = SHA-256( previous_hash_bytes || "|" || canonical_payload_bytes )

The separator | is one literal pipe character. It exists so a single SHA-256 round covers the linkage and the payload without any chance of length-extension confusion.

For Python entries the canonical payload is the wrapper object with sort_keys=True. The reference implementation in shared/audit.py uses json.dumps(entry, sort_keys=True) directly:

entry_str = json.dumps(entry, sort_keys=True)
new_hash = hashlib.sha256(f"{self._last_hash}{entry_str}".encode()).hexdigest()

Note that the Python and Rust hash inputs differ in two details that verifiers must respect:

  1. The Rust side uses | as a separator between previous_hash and the canonical payload. The Python side does not.
  2. The Rust canonical payload excludes hash. The Python canonical payload is the wrapper object excluding hash, but the wrapper contains a nested entry whose order Python preserves as-is when sort_keys=True walks it recursively.

We are aware the two formats are not byte-identical at the hash-input layer. The on-disk wire format (the JSONL itself) is interleaved-safe because the verifier dispatches on the presence of the entry field. A future v2 of the format will unify the hash input. Until then, either parse rule recomputes the chain from the file alone; an external verifier can use the same dispatch logic.

Genesis hash

0000000000000000000000000000000000000000000000000000000000000000

Sixty four ASCII zeros. Used as the previous_hash of the first entry in a brand new audit directory. The Python side defines it as AuditWriter._GENESIS_HASH. The Rust side defines it as audit::GENESIS_HASH.

Verifying a chain (Python reference)

A complete verifier in about thirty lines. Reads a directory of audit-YYYY-MM-DD.jsonl files in date order, walks every entry, and recomputes the hash. Returns the first entry where the recomputed hash does not match the stored hash, or None if the whole chain is intact.

#!/usr/bin/env python3
"""Audit chain verifier — reads vyrox audit log directory, checks chain."""
import hashlib
import json
import sys
from pathlib import Path

GENESIS = "0" * 64


def recompute(prev_hash: str, entry: dict) -> str:
    # Dispatch on shape: Rust action entry vs Python wrapped entry.
    if "action_type" in entry and "entry" not in entry:
        payload = {k: v for k, v in entry.items() if k != "hash"}
        canonical = json.dumps(payload, separators=(",", ":"), sort_keys=True)
        h = hashlib.sha256()
        h.update(prev_hash.encode("utf-8"))
        h.update(b"|")
        h.update(canonical.encode("utf-8"))
        return h.hexdigest()
    payload = {k: v for k, v in entry.items() if k != "hash"}
    return hashlib.sha256(
        f"{prev_hash}{json.dumps(payload['entry'], sort_keys=True)}".encode("utf-8")
    ).hexdigest()


def verify(audit_dir: Path) -> tuple[int, str] | None:
    prev = GENESIS
    line_no = 0
    for f in sorted(audit_dir.glob("audit-*.jsonl")):
        for raw in f.read_text().splitlines():
            if not raw.strip():
                continue
            line_no += 1
            entry = json.loads(raw)
            if entry["previous_hash"] != prev:
                return line_no, f"previous_hash mismatch in {f.name}"
            expected = recompute(prev, entry)
            if expected != entry["hash"]:
                return line_no, f"hash mismatch in {f.name}: expected {expected}, got {entry['hash']}"
            prev = entry["hash"]
    return None


if __name__ == "__main__":
    bad = verify(Path(sys.argv[1]))
    if bad:
        print(f"FAIL line {bad[0]}: {bad[1]}")
        sys.exit(1)
    print(f"OK ({line_no} entries)")

Save as verify_audit.py, run with python verify_audit.py /path/to/audit-dir.

The verifier exits non-zero on the first mismatch and prints the file and the byte cause. Customers running their own compliance pipeline should run this from CI nightly against the previous day's audit directory.

Chain continuity across restarts

The chain survives process restart. On boot:

  • Python: AuditWriter.__init__ calls _sync_read_last_hash against today's log file. If the file exists, it reads the last line, parses it as JSON, and uses the hash value as the seed. If the file is missing, empty, or unparseable, the seed is the genesis sentinel.
  • Rust: audit::ChainState::from_file does the same. It calls read_audit_logs (which silently skips malformed lines) and uses the hash of the last well-formed entry as the seed.

The continuity is enforced by tests in both implementations:

  • Python: tests/test_p05_blockers.py::test_audit_chain_survives_process_restart
  • Rust: vyrox-proxy/src/audit.rs::tests::chain_survives_restart

A break in continuity (an entry whose previous_hash does not match the previous entry's hash) is detectable by the verifier above. There is no path in the production code that writes an entry whose previous_hash is not the last in-memory hash.

Tamper detection in practice

A single byte modification anywhere in an entry breaks the chain at that entry and at every entry after it. The verifier reports the first break by line number. The original entry stays on disk; only the chain pointer breaks.

Truncation (deleting trailing entries from a file) is not detectable by the chain alone. The hash chain only proves that the entries you have are linked. It does not prove that there are no missing entries at the end. Mitigation: customers run the verifier nightly and store the last-seen hash from the previous run; a missing tail entry surfaces as a chain that ends earlier than the previous nightly run recorded.

Truncation across the very last in-memory hash (a writer that died mid-write) is detectable on restart. The writer's __init__ reads the file from disk; if the on-disk last_hash is older than the last in-memory value before the crash, the restart resumes from the on-disk value and any post-crash writes link from there. The lost window is bounded by the writer's flush interval; both implementations fsync after every entry.

Durability properties

  • Append-only on disk. Both implementations open with the O_APPEND flag. Concurrent writers serialise at the kernel level.
  • Fsync after every entry. Python uses os.fsync(fileno). Rust uses tokio::fs::File::sync_data. A power loss between write and OS flush does not lose the entry.
  • No buffering above the OS layer. Neither implementation holds pending entries in user-space memory after the write returns.

File rotation and retention

The platform does not rotate or delete audit files. Files accumulate in the configured AUDIT_LOG_PATH directory forever. Customers are free to copy files to long-term storage; the chain stays intact as long as the copy preserves byte content.

If you want to compress old files for storage, use a streaming codec that preserves the original byte stream (gzip is fine). Decompressing the file back to the original bytes and running the verifier produces the same result as verifying the live file.

Field stability

The on-disk format is part of the public API. Adding new fields to the entry is non-breaking as long as verifiers ignore unknown fields. Renaming or removing fields is breaking.

Tracked future changes (none committed):

  • Unify the Rust and Python canonical-payload computation so a single verifier function covers both shapes without dispatch.
  • Add a schema_version field so verifiers can short-circuit on a known-incompatible chain.

Both will be announced in CHANGELOG.md at least thirty days before they ship.

Cross-references

API reference

Every public HTTP surface exposed by the Vyrox platform. There are seven endpoints across three services:

ServiceMethodPathAuth
IngestionPOST/webhook/crowdstrikeHMAC-SHA256 over body
IngestionPOST/webhook/sentineloneBearer token
IngestionPOST/webhook/defenderBearer token (Microsoft clientState)
IngestionPOST/webhook/generic/{tenant_id}HMAC-SHA256 over body
IngestionGET/healthnone
Rust proxyPOST/executeHMAC-SHA256 over body
Rust proxyGET/audit/export?tenant_id={id}HMAC over tenant_id:timestamp
Rust proxyGET/healthnone

The Discord bot exposes /interactions and /webhook. Those are not documented here because they speak the Discord protocol (Ed25519) or are internal-only (worker to bot, HMAC signed). If you need to call into the bot, you are inside the Vyrox monorepo and there is no public contract.

Authentication primitives

HMAC-SHA256 over a request body

Used by the CrowdStrike webhook, the Generic webhook, and the proxy /execute. The signing function in shared/crypto.py::sign(payload, secret) returns f"sha256={hex_digest}". The verifier on the receiving side strips the sha256= prefix and compares against its own computed digest with hmac.compare_digest (Python) or subtle::ConstantTimeEq (Rust).

Two rules for any caller. Sign the raw bytes you put on the wire. If your body is JSON, pin the encoding so the byte stream is deterministic:

import json
from shared.crypto import sign

body = json.dumps(payload, separators=(",", ":"), sort_keys=True)
signature = sign(body, secret)  # "sha256=..."

The separators and sort_keys parameters matter. Without them, Python and Rust will serialise the same dictionary into different byte streams and the signature will mismatch even when the value is identical.

Bearer token

Used by the SentinelOne and Defender webhooks. The header is Authorization: Bearer <secret> and the receiver constant-time compares with hmac.compare_digest. The secret is per tenant; resolution happens after an "untrusted preview parse" of the body, identical to the HMAC routes (see _resolve_tenant_webhook_secret in ingestion/main.py).

Replay window

The Rust proxy and the audit-export endpoint both enforce a thirty second replay window. The timestamp is part of the signed message. Requests older than thirty seconds, or more than thirty seconds in the future, return HTTP 410.

The window is symmetric on purpose. A client whose clock is ahead of ours by minutes cannot pre-sign requests for later use.

Common patterns

Per-tenant authentication

Every ingestion route resolves the tenant before verifying the signature. The flow is the same regardless of vendor:

  1. Read the raw body bytes.
  2. Parse the body as JSON. This parse is untrusted; the result is used only to extract the vendor's tenant identifier (customer_id, accountId, tenantId).
  3. Look up the per-tenant secret in tenant_credentials. Fall back to the environment-configured default secret for that vendor if the tenant has not been onboarded yet.
  4. Verify the signature or bearer token against that secret.
  5. Only after verification succeeds, promote the parsed body to the trusted payload and run the adapter.

A payload with no tenant identifier returns HTTP 400 with {"detail": "missing tenant identifier"}. There is no default-tenant fallback. This was the SEV-1 risk removed on 2026-05-21.

Error envelope

Every endpoint returns a JSON body on error. The shape is consistent:

{ "detail": "<short, generic message>" }

We do not leak which part of the credential was wrong, which field was missing, or what the expected signature would have been. Errors are the same for every failure of the same class. Use the audit log if you need to debug an authentication failure; the log records the request correlation ID, the resolved tenant, and the failure kind.

Status codes

CodeMeaning
200 OKUsed only by /health and the proxy /audit/export.
202 AcceptedWebhook payload was authenticated and queued for triage.
400 Bad RequestMissing tenant identifier on a webhook payload.
401 UnauthorizedAuthentication failed. Generic message; no specifics.
410 GoneTimestamp outside the thirty second replay window.
422 Unprocessable EntityAuthenticated payload could not be normalised.
503 Service UnavailableRedis is unreachable. Retry-After: 5 header set.

Ingestion endpoints

The ingestion service runs on port 8001 by default. All four webhook routes return HTTP 202 with {"status": "queued", "alert_id": "<uuid>"} on success.

POST /webhook/crowdstrike

Receives CrowdStrike Falcon detection events.

Headers:

Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>

The signature is computed over the raw body using the tenant's HMAC secret. The tenant identifier is the customer_id field on the payload.

Body shape (minimal):

{
  "detect_id": "evt:1234567890:abc123",
  "customer_id": "acme-corp",
  "timestamp": 1704067200,
  "severity": "high",
  "tactic": "TA0004",
  "technique": "T1059",
  "sensor": {
    "hostname": "workstation-01",
    "agent_id": "12345678-1234-1234-1234-123456789abc"
  },
  "process": {
    "file_name": "cmd.exe",
    "command_line": "powershell -enc JABjAGwA...",
    "user_name": "CORP\\jsmith",
    "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
  }
}

Fields the normaliser reads (ingestion/models.py::_from_crowdstrike):

FieldRequiredNotes
detect_idyesVendor side identifier, stored as raw_id for dedup.
customer_idyesResolves the tenant. A missing or empty value returns HTTP 400.
timestampnoDefaults to time.time() at receive time.
severitynoUppercased and stored as vendor_severity.
tacticnoMITRE tactic name.
techniquenoMITRE technique ID.
sensor.hostnamenoThe affected endpoint.
process.file_namenoThe executable name.
process.command_linenoFull command line. Triage values this heavily.
process.user_namenoUser context. Domain format like CORP\\jsmith is fine.
process.sha256noFile hash.

POST /webhook/sentinelone

Receives SentinelOne threat events.

Headers:

Content-Type: application/json
Authorization: Bearer <tenant_secret>

The bearer token is constant-time compared against the per-tenant secret. The tenant identifier is accountId on the body.

Body shape (minimal):

{
  "id": "thrt_1234567890abc",
  "accountId": "acme-corp",
  "createdAt": 1704067200,
  "severity": "high",
  "mitreTactic": "TA0004",
  "mitreTechnique": "T1059",
  "agentRealtimeInfo": {
    "computerName": "workstation-01",
    "agentId": "1234567890abc"
  },
  "fileName": "powershell.exe",
  "commandLine": "powershell -enc JABjAGwA...",
  "fileContentHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}

POST /webhook/defender

Receives Microsoft Defender for Endpoint alerts via the Microsoft Graph Security API webhook subscription.

Headers:

Content-Type: application/json
Authorization: Bearer <clientState>

The bearer value is the clientState you chose at subscription time. The tenant identifier is tenantId on the body (the Azure AD tenant the alert came from).

Body shape (alertV2 subset that the normaliser reads):

{
  "id": "abc123",
  "tenantId": "11111111-2222-3333-4444-555555555555",
  "createdDateTime": "2026-05-23T14:32:00Z",
  "severity": "high",
  "category": "CredentialAccess",
  "mitreTechniques": ["T1003"],
  "evidence": [
    {
      "deviceDnsName": "workstation-01.acme.local",
      "userAccount": {
        "userPrincipalName": "jsmith@acme.com"
      },
      "imageFile": {
        "fileName": "lsass-dumper.exe"
      },
      "processCommandLine": "lsass-dumper.exe -o creds.dmp",
      "fileDetails": {
        "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
      }
    }
  ]
}

The Defender evidence array is heterogeneous. The normaliser (ingestion/models.py::_from_defender) walks the array and pulls the first instance of each evidence kind it recognises. Microsoft can and does add new evidence kinds; that does not break normalisation, the adapter just ignores what it has not seen before.

POST /webhook/generic/{tenant_id}

The catch-all webhook for any EDR that can POST JSON but is not on the natively-supported list. The tenant identifier comes from the URL path because the customer's payload shape is not known in advance.

Headers:

Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>

The customer also supplies a field map at onboarding time. The map tells the adapter how to find each NormalizedAlert field on their payload, using dotted-path notation:

{
  "raw_id": "event.id",
  "hostname": "device.name",
  "username": "actor.upn",
  "process_name": "process.exe",
  "process_cmdline": "process.cli",
  "sha256": "file.hash",
  "vendor_severity": "metadata.severity",
  "tactic": "mitre.tactic",
  "technique": "mitre.technique",
  "timestamp": "event.ts"
}

Required keys in the field map: raw_id, hostname, vendor_severity. A missing required key returns HTTP 422.

GET /health

Returns {"status": "ok"} when the service is up and Redis is reachable. Returns 503 with Retry-After: 5 otherwise.

Containment proxy endpoints

The Rust proxy runs on port 3000 by default. It accepts two non-health requests and refuses everything else.

POST /execute

Executes a human-approved containment action.

Headers:

Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>

Body:

{
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "tenant_id": "acme-corp",
  "alert_id": "alt_abc123",
  "action_type": "HOST_ISOLATION",
  "host": "workstation-01",
  "approved_by": "jane.smith#1234",
  "approved_at": 1704067200
}
FieldTypeNotes
request_idstringUUID-v4. Idempotency key. Same ID returns the cached response.
tenant_idstringMulti-tenant scope. Carried into every audit entry.
alert_idstringThe alert that triggered the action.
action_typeenumHOST_ISOLATION, PROCESS_KILL, or NETWORK_QUARANTINE.
hoststringVendor-specific host identifier. CrowdStrike uses device IDs.
approved_bystringDiscord username that clicked Approve.
approved_atintUnix epoch seconds. Must fall in the replay window.

Responses:

{ "status": "executed", "dry_run": false }
{ "status": "dry_run",  "dry_run": true  }
{ "status": "replayed", "dry_run": false }
StatusMeaning
executedThe EDR vendor returned success.
dry_runDRY_RUN=true was in effect; the EDR API was not called.
replayedThe same request_id was already executed; the cached response is returned without calling the EDR again.

Error codes:

CodeCause
400request_id empty or body fails to parse after HMAC succeeds.
401HMAC verification failed, or X-Vyrox-Signature header missing.
409Same request_id still in flight from a prior call.
410approved_at outside the thirty second replay window.
500Internal failure, including audit write failure. The nonce claim is released so a retry can succeed.
502EDR vendor API returned an error. The nonce claim is released.

GET /audit/export?tenant_id={id}

Returns every audit entry for the requested tenant. The entries are returned as JSON in the response body; for streaming exports on large logs, see the roadmap.

Headers:

X-Vyrox-Signature: sha256=<hex_digest>
X-Vyrox-Timestamp: 1704067200

The signature is HMAC-SHA256 of the canonical message "<tenant_id>:<timestamp>" using the shared HMAC secret. The timestamp must fall in the thirty second replay window. Without both headers, the response is 401.

Response:

[
  {
    "timestamp": 1704067200,
    "tenant_id": "acme-corp",
    "action_type": "HOST_ISOLATION",
    "host": "workstation-01",
    "approved_by": "jane.smith#1234",
    "dry_run": false,
    "previous_hash": "0000...0000",
    "hash": "e3b0c4..."
  }
]

Every entry carries a previous_hash and a hash so an external verifier can reproduce the chain. The format spec and a reference verifier are in AUDIT_CHAIN.md.

GET /health

Returns {"status": "ok"} when the proxy is up. The health endpoint has no dependencies; it returns 200 even when EDR vendors are unreachable.

Rate limiting

The ingestion service has no rate limit at the HTTP layer. EDR vendors themselves rate-limit their webhook deliveries. If you need to slow the worker down, throttle at the queue layer.

The Rust proxy has no per-route rate limit either. The nonce store dedups by request_id for ten minutes, which is the effective limit for repeated requests with the same ID. A burst of unique requests hits the EDR vendor's own rate limit, which the proxy surfaces as a 502.

A per-tenant rate limit on the proxy is on the roadmap. The driver is operational: a misconfigured automation that fires a hundred Approve clicks in a second should not turn into a hundred EDR API calls.

Versioning

API contracts in this document are stable for the alpha. Breaking changes will be announced in CHANGELOG.md of the relevant repo at least thirty days before they ship. New endpoints can be added without notice. New optional fields on existing endpoints can be added without notice. Removing or renaming a field is breaking.

The audit log format is versioned separately. See AUDIT_CHAIN.md.

Testing your integration

Use the simulator. It is in vyrox-simulator and runs entirely in bash with openssl and curl. Replays a signed mimikatz alert against a local ingestion service in under five seconds:

git clone https://github.com/vyrox-security/vyrox-simulator
cd vyrox-simulator
VYROX_URL=http://localhost:8001/webhook \
  VYROX_HMAC_SECRET=$(cat your-test-secret) \
  ./simulate.sh mimikatz

--dry-run prints the signed payload to stdout without making the HTTP call. Useful for debugging signature mismatches.

EDR adapter contributor guide

This document is for a contributor who wants to add a new EDR vendor to the Vyrox ingestion pipeline. The current set is CrowdStrike Falcon, SentinelOne, Microsoft Defender for Endpoint, and a customer-mapped generic JSON webhook. The fifth one might be yours.

What an adapter is

A Vyrox adapter is the code that turns one specific EDR vendor's webhook payload into a NormalizedAlert. The triage pipeline downstream of ingestion only sees NormalizedAlert. It does not care which vendor the alert came from. Adding a new vendor is mechanical: write one factory method, one route, one test file, update one README, done.

The contract between the adapter and the rest of the platform is the four rules in the next section. The rules are not stylistic; they are how the security model holds. Every existing adapter follows them. Every new adapter must.

The four rules

These exist in the private monorepo at vyrox/ingestion/adapters/README.md. They are reproduced here so contributors do not need access to the private side to know what to build.

Rule 1: Authentication before parsing

The route MUST verify the request's authentication before running json.loads() on the body. Parsing untrusted bytes is a class of attack we do not need to be exposed to.

The accepted pattern, in pseudocode:

body = await request.body()                       # 1. raw bytes
preview = json.loads(body)                        # 2. untrusted parse, only to find tenant_id
tenant_id = resolve_tenant_id(vendor, preview)    # 3. raises if missing
secret = resolve_tenant_secret(tenant_id, vendor) # 4. per-tenant
verify(body, signature, secret)                   # 5. authenticate on raw bytes
payload = preview                                 # 6. now trusted
alert = NormalizedAlert._from_<vendor>(payload, tenant_id)

Step 2 is the only place where an unauthenticated parse is allowed, and its result is used for one thing only: finding the tenant_id field on the payload. If the per-tenant secret lookup fails or the signature comparison fails, the request returns 401 before any business logic touches the parsed dict.

Rule 2: tenant_id from authenticated context

The tenant_id that goes onto the NormalizedAlert MUST come from a source the signature actually authenticates. Two acceptable patterns:

  • The tenant identifier is part of the signed body. CrowdStrike (customer_id), SentinelOne (accountId), and Defender (tenantId) all work this way. The preview-parse trick is safe because the per-tenant secret is keyed on the identifier from the preview, and the signature compare uses that secret. A wrong tenant either produces no secret lookup hit or fails the signature check.
  • The tenant identifier is part of the URL path. The generic adapter works this way. The URL itself is not signed, but the per-tenant secret is keyed on the path tenant_id, so a mismatched path resolves to the wrong secret and the HMAC compare fails.

What is NOT acceptable: trusting an unauthenticated header like X-Tenant-Id, relying on a query string parameter, or falling back to a shared default tenant when the identifier is missing. The MissingTenantIdentifier exception in the private ingestion/main.py exists for exactly this case. Missing identifier returns HTTP 400, never a silent route to a shared bucket.

Rule 3: Audit entry before HTTP 202

Every accepted alert MUST land in the audit JSONL chain before the ingestion handler returns 202 to the EDR vendor. The order matters. If the process crashes between the enqueue and the audit write, we prefer the audit to be missing rather than the alert. The current implementation writes the audit hop inside queue.enqueue for that reason.

If your adapter calls a non-default code path that bypasses queue.enqueue, write the audit entry manually before the route returns. The pattern in shared/audit.py::AuditWriter.write takes a dict; the conventional event name is ingest.accepted with at minimum tenant_id, source (vendor name), and raw_id (the vendor's own alert ID).

Rule 4: Output is a valid NormalizedAlert

The only thing the rest of the pipeline sees is NormalizedAlert. Your adapter MUST produce one. Three constraints:

  • source is a unique vendor string. Lowercase, no spaces. Choose one that does not collide with the existing four (crowdstrike, sentinelone, defender, generic).
  • tenant_id is populated from the authenticated context (rule 2).
  • id is a fresh internal UUID. Do not reuse the vendor's identifier. Store the vendor's ID in raw_id instead. The two are not the same: raw_id is for vendor-side dedup; id is the Vyrox-internal identifier referenced by audit entries and Discord buttons.

Missing optional fields default to None or empty string. Never to a placeholder like "unknown" — the triage engine treats None and "unknown" differently.

What NormalizedAlert looks like

@dataclass
class NormalizedAlert:
    tenant_id: str
    id: str                                # internal UUID, auto-generated
    source: str                            # "crowdstrike", "sentinelone", ...
    raw_id: str                            # vendor's own alert ID, used for dedup
    timestamp: int                         # unix epoch seconds
    hostname: str                          # affected endpoint
    username: str | None                   # optional
    process_name: str | None
    process_cmdline: str | None
    sha256: str | None
    tactic: str | None                     # MITRE tactic name
    technique: str | None                  # MITRE technique ID
    vendor_severity: str                   # INFORMATIONAL | LOW | MEDIUM | HIGH | CRITICAL

The dataclass is intentionally flat. Nested vendor structures (CrowdStrike's sensor, SentinelOne's agentRealtimeInfo, Defender's evidence array) are flattened during normalisation. Triage code reads top-level fields only.

vendor_severity is the vendor's own assessment, not Vyrox's. The triage pipeline produces its own verdict afterwards.

Adding a new vendor in six steps

The example below sketches an adapter for a hypothetical "Acme EDR" vendor that posts alerts to a webhook with a bearer token.

Step 1: Add a factory method on NormalizedAlert

In the private monorepo, in vyrox/ingestion/models.py, add a classmethod that takes the vendor payload and a tenant_id and returns a populated NormalizedAlert.

@classmethod
def _from_acme(cls, payload: dict[str, Any], tenant_id: str) -> "NormalizedAlert":
    """
    Parse an Acme EDR alert payload into a NormalizedAlert.

    Acme posts a flat JSON with a top-level `alert_uuid`, a nested
    `endpoint` block, and a nested `actor` block. The schema is the
    one documented at <Acme docs URL> retrieved on <date>.
    """
    return cls(
        tenant_id=tenant_id,
        source="acme",
        raw_id=str(payload.get("alert_uuid", "")),
        timestamp=int(payload.get("ts", time.time())),
        hostname=payload.get("endpoint", {}).get("name", ""),
        username=payload.get("actor", {}).get("user"),
        process_name=payload.get("actor", {}).get("process_name"),
        process_cmdline=payload.get("actor", {}).get("command_line"),
        sha256=payload.get("actor", {}).get("sha256"),
        tactic=payload.get("mitre", {}).get("tactic"),
        technique=payload.get("mitre", {}).get("technique"),
        vendor_severity=str(payload.get("severity", "LOW")).upper(),
    )

Two conventions worth following. Pin the Acme schema URL and the date you read it in the docstring; vendors change their format and a future maintainer needs to know which version you targeted. Default optional fields to None (or empty string for strings); do not substitute placeholders.

Step 2: Add a thin adapter module

In vyrox/ingestion/adapters/, create acme.py:

"""
Acme EDR webhook adapter.

The route in `ingestion/main.py` calls into `normalize`. This module
exists to keep the route file readable as the vendor count grows.
"""

from __future__ import annotations
from typing import Any

from ingestion.models import NormalizedAlert


def normalize(payload: dict[str, Any], tenant_id: str) -> NormalizedAlert:
    """Convert an Acme alert payload into a NormalizedAlert."""
    return NormalizedAlert._from_acme(payload, tenant_id)

The module is intentionally tiny. The reason is convention: every adapter ships as a normalize(payload, tenant_id) -> NormalizedAlert function so the route code does not have to memorise factory method names.

Step 3: Add a route in ingestion/main.py

Mirror the existing routes. Here is the shape for a bearer-token vendor that puts tenant_id in the body:

@app.post("/webhook/acme", status_code=status.HTTP_202_ACCEPTED)
async def webhook_acme(
    request: Request,
    authorization: str = Header(default=""),
    q: QueueClient = Depends(get_queue_client),
) -> dict[str, str]:
    if not authorization or not authorization.startswith("Bearer "):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="invalid signature")
    token = authorization[7:]

    body = await request.body()
    try:
        untrusted_preview = json.loads(body)
    except json.JSONDecodeError:
        raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")
    if not isinstance(untrusted_preview, dict):
        raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")

    try:
        tenant_id = resolve_tenant_id("acme", untrusted_preview)
    except MissingTenantIdentifier:
        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="missing tenant identifier")

    tenant_secret = _resolve_tenant_webhook_secret(
        tenant_id=tenant_id, vendor="acme", default_secret=settings.acme_webhook_secret
    )
    if not tenant_secret or not hmac.compare_digest(token, tenant_secret):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="invalid signature")

    payload = untrusted_preview

    try:
        from ingestion.adapters import acme as acme_adapter
        alert = acme_adapter.normalize(payload, tenant_id)
    except Exception:
        raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")

    if not q:
        raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="redis unavailable", headers={"Retry-After": "5"})

    try:
        alert_id = await q.enqueue(alert)
        return {"status": "queued", "alert_id": alert_id}
    except (EnqueueFailed, ConnectionError):
        raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="redis unavailable", headers={"Retry-After": "5"})

For an HMAC-signed vendor (like CrowdStrike or the generic adapter) swap the bearer-token check for verify(body.decode("utf-8"), x_vyrox_signature, tenant_secret). The shape stays the same.

Step 4: Wire the tenant identifier into resolve_tenant_id

Add a case to resolve_tenant_id:

elif source == "acme":
    identifier = payload.get("customer_id")  # or whatever Acme calls it

If the vendor identifier is missing, the function raises MissingTenantIdentifier, the route returns 400, and the EDR retries. No silent default.

Step 5: Add tests

Create vyrox/tests/test_adapters_acme.py. Cover at least:

  • Happy path: a valid signed payload returns 202 with an alert_id.
  • Missing tenant ID: returns 400.
  • Wrong signature: returns 401.
  • Malformed JSON: returns 422.
  • Redis unavailable: returns 503 with Retry-After: 5.
  • Field mapping: the resulting NormalizedAlert has the expected values for every field your factory populates.

Use the same fixture style as tests/test_ingestion_main.py. The existing tests are the right template; copy and adjust.

Step 6: Update the adapter README and the public docs

Two files to touch:

  • vyrox/ingestion/adapters/README.md (private): add a row to the adapter table.
  • vyrox-docs/API_REFERENCE.md (public): add the new endpoint with its full schema and the field-mapping table.

The pattern in the existing adapters is the documentation contract. A reviewer reading the new endpoint should be able to integrate against it without reading your code.

Anti-patterns we catch in review

The list below is what we have actually rejected in past reviews.

  • "Just for testing" default-tenant fallback. Returns a shared bucket when the identifier is missing. This was the SEV-1 we removed on 2026-05-21. There is no scenario where this is correct.
  • Re-serialising the body before HMAC verify. Python's default json.dumps and Rust's serde_json disagree on whitespace and key order. Always verify on the raw bytes from await request.body(), never on json.dumps(payload).
  • Skipping per-tenant secret lookup "for the pilot". The pilot is when per-tenant secrets matter most. Falling back to the global secret is a deliberate, audited choice for un-onboarded tenants only.
  • Logging the full raw payload. Payloads contain process command lines, user accounts, hostnames. Log structured fields, not the whole blob.
  • Treating the vendor's severity as Vyrox's verdict. The vendor's severity goes into vendor_severity. Triage produces a separate verdict. Conflating the two breaks the entire downstream contract.

Adapters that already exist

AdapterVendorAuthTenant ID sourceCode
crowdstrikeCrowdStrike Falcon detection eventsHMAC-SHA256customer_id on bodyprivate
sentineloneSentinelOne streaming APIBearer tokenaccountId on bodyprivate
defenderMicrosoft Graph Security API alertV2Bearer token (Microsoft clientState)tenantId on bodyprivate
genericAny EDR posting JSONHMAC-SHA256URL pathprivate

The CrowdStrike and SentinelOne factories live directly on NormalizedAlert (_from_crowdstrike, _from_sentinelone) for historical reasons. The Defender and generic factories live in the adapter package. Newer adapters should follow the package pattern.

What the review focuses on

When a contributor opens an adapter PR, the reviewer checks:

  • Authentication-before-parse order, byte-exact.
  • Per-tenant secret lookup, with the global default only as a fallback for un-onboarded tenants.
  • Tenant ID source is authenticated.
  • Audit entry written before the 202 returns.
  • NormalizedAlert.source is unique and lowercase.
  • raw_id is set from the vendor's own identifier.
  • Tests cover the five failure modes plus the happy path.
  • Schema URL and date are pinned in the factory docstring.
  • No raw payload logging.
  • Public docs updated with the new endpoint.

Adapters that pass review tend to ship in a single PR. Adapters that fail review usually fail rule 1 (parse before verify) or rule 2 (tenant from unauthenticated source). Read the existing adapters before writing yours.

Cross-references

Roadmap

This roadmap is organised by capability, not by quarter. It tells a reader what we are working on, what is committed, and what is on the list but not started. It does not contain revenue targets, customer counts, or aspirational SLA percentages. Those live in negotiated contracts and internal planning, not in OSS docs.

The roadmap reflects the state of the codebase. Items move from "planned" to "in flight" when a branch exists. They move from "in flight" to "shipped" when the code is in main with tests. They move from "shipped" to "stable" when they survive a quarter without being reverted.

Recently shipped

The two audits in May 2026 (private; the fixes are public) drove sixteen blocker items that all shipped between 2026-05-21 and 2026-05-23. The list:

P0 (audit 1, 2026-05-21):

  • Per-tenant webhook secrets for CrowdStrike and SentinelOne.
  • Removal of the default-tenant fallback. Missing identifier now returns HTTP 400, no silent shared bucket.
  • HTTP 503 on Redis enqueue failure with Retry-After: 5. The silent-drop bug is gone.
  • Distinct QueueUnavailable exception on dequeue so the worker can apply exponential backoff instead of spinning on a dead connection.
  • Token budget enforced before every LLM call. Exhausted tenants return MEDIUM/0.5 with budget_exhausted in the audit log.
  • Redis-backed retry queue for failed Discord notifications. Tenant scoped dead-letter with two-week TTL. Backup notification path triggers at five dead-letter entries per hour.
  • Deterministic JSON in the proxy client. Python and Rust now sign byte-identical payloads with separators=(",", ":") and sort_keys=True.
  • Idempotency on Discord approval clicks. Alert status flips through executing so a concurrent click loses the race.
  • Four new regression tests: cross-tenant isolation, HMAC round-trip, no-autonomous-containment static guard, replay-window enforcement.

P0.5 (audit 2, 2026-05-23):

  • Discord Ed25519 verification on the bot /interactions endpoint. Anyone with the URL can no longer forge an Approve click.
  • HMAC verification on the bot /webhook endpoint. The phishing pretext via fake embeds is closed.
  • HMAC verification on the proxy /audit/export endpoint. Tenant data leak via query string is closed.
  • SHA-256 hash chain on the Rust proxy audit log. The Python and Rust chains now agree on format. See AUDIT_CHAIN.md.
  • Python audit chain survives process restart. Previously __init__ reset to genesis; now it reads the last hash from today's log.
  • Real /onboard Discord wizard. Generates a per-tenant webhook secret with secrets.token_hex(32), persists TenantCredential, fires a signed synthetic alert through ingestion, refuses to mark the tenant active until the round-trip succeeds.
  • .env.example rewritten with logical sections and every new variable documented.
  • Settings.effective_redis_url() resolves the canonical REDIS_URL first and falls back to the legacy Upstash REST variables. The worker refuses to start with no Redis URL.

In flight

Tracked in the private todo.md; called out here when the work touches a public contract.

  • Postgres migration before tenant twenty five. SQLite write contention is the binding constraint at scale. The schema is already SQLModel-compatible so the migration is a SQL dump plus a connection string change.
  • Streaming /audit/export. Current endpoint reads the full log into memory. Fine for pilot scale; needs to be a streaming JSONL response for SaaS scale.
  • Per-tenant secret encryption at rest. The tenant_credentials.webhook_secret_encrypted column stores raw bytes during the pilot. Encryption module ships before the first production payment.
  • Concurrent per-tenant triage. The worker polls tenants sequentially. A slow triage on tenant A blocks tenants B through J. Move to asyncio.gather with bounded concurrency.
  • Retry runner wired into worker startup. The retry-queue background task exists but the worker entrypoint does not invoke it. The queue accumulates without draining.

Planned, not started

Public-facing items only. The internal product roadmap covers more.

  • Programmatic API. A REST API for tenants to fetch verdicts, audit entries, and statistics outside of Discord. OAuth2 client credentials per tenant. Prerequisite for MSP integrations.
  • Customer-side audit verifier. A small Rust binary that walks a tenant's audit-YYYY-MM-DD.jsonl directory and verifies the chain. Distributed as a single static binary. The Python reference in AUDIT_CHAIN.md is the spec.
  • EU data region. Per-tenant data_region flag. Ingestion endpoint shard. No cross-region data flow. Required for any EU customer with a GDPR review.
  • Generic-vendor adapter at the route level. Today the generic adapter's field map lives in tenant_credentials.edr_api_key_encrypted as a JSON string. That column should be split into a dedicated field_map_json column with a real schema.
  • Public OpenAPI spec. The four ingestion routes and the two proxy routes documented in API_REFERENCE.md. Auto-generated from FastAPI on the public side; hand-written for the Rust proxy.
  • Web operator interface. Read-only first. Tenant status, recent verdicts, audit search, monthly digests. Triggered when a customer refuses Discord-only or when total customer count crosses double digits, whichever first.

Adapter coverage

VendorStatusNotes
CrowdStrike FalconshippedDetection events via HMAC-signed webhook.
SentinelOneshippedStreaming API via bearer token.
Microsoft Defender for EndpointshippedGraph Security API alertV2 via bearer (clientState).
Generic JSON webhookshippedCustomer-mapped field map per tenant.
SophosplannedNative adapter, on demand.
Quick Heal / SeqriteplannedNative adapter, on demand.
TrellixplannedNative adapter, on demand.
Syslog (CEF / LEEF)plannedSeparate service that converts to the ingestion contract.

The "on demand" adapters land when a real customer commits to a pilot conditional on the vendor. We do not build speculatively. The contract a new adapter must follow is in ADAPTERS.md.

Compliance and certification

  • SOC 2 Type I evidence collection: in flight. Audit log format (hash-chained JSONL) is the substrate. Vendor selection for the audit firm is internal.
  • SOC 2 Type II: planned after Type I closes.
  • ISO 27001 prep: planned after SOC 2 Type II.
  • Public bug bounty: not active during alpha. See SECURITY.md for the disclosure-only model in effect today.

Versioning and release cadence

The four public repos follow semver. Today everything is 0.1.x. The audit log format will get an explicit schema_version field before we bump to 0.2.x. The HMAC signing format and the public webhook URL shape will not change between 0.x releases without a deprecation notice in the relevant repo's CHANGELOG.md at least thirty days ahead.

Patch releases happen as needed. Minor releases happen when a meaningful feature lands. Major releases are reserved for breaking changes to a contract published in this repo.

What is intentionally not on the roadmap

A short list, kept honest, of capabilities we do not plan to build.

  • A SIEM. Vyrox is not a log lake. We ingest alerts the EDR already decided are worth surfacing. Customers who want a SIEM have a SIEM.
  • A managed SOC service with humans. We are a software platform. We point customers at MSSPs for the human SOC layer.
  • A web dashboard during alpha. The first ten pilots use Discord exclusively. The dashboard ships when triggered (see "Planned, not started" above), not on a calendar.
  • A free public ingestion endpoint. The ingestion service is operated per tenant. Anyone running their own can use the open path documented in QUICKSTART.md.

Cross-references

Security policy

This document tells you how to report a vulnerability to Vyrox, what we will do with the report, what is in scope and what is not, and which properties of the system we consider security invariants. If you read nothing else, the contact is sec.vyrox@proton.me and the PGP key is at vyrox.dev/.well-known/pgp-key.txt.

Reporting

Send the report to sec.vyrox@proton.me. Subject line SECURITY: <one line description>. PGP-encrypt the body if the finding is sensitive; the key is published at vyrox.dev/.well-known/pgp-key.txt.

Please include:

  • A description of the issue, in plain language, with enough detail that we can reproduce it.
  • The repository or service affected. If the bug is in vyrox-proxy, include the commit hash you tested against.
  • A proof of concept, ideally as a single shell command or a short script. Synthetic targets only. Do not exploit a production tenant.
  • Your preferred handle for credit, or "anonymous" if you would rather not be named.

Please do not file vulnerabilities as public GitHub issues.

Response

Acknowledgement within forty-eight hours. Initial triage decision within seven calendar days. Patch timeline shared within fourteen calendar days, including a target fix date and the version we expect to roll into.

If we accept the report, we coordinate disclosure with you. We default to a thirty day embargo while we ship and verify the fix. The embargo extends if the issue affects a vendor we have not yet patched against, and we tell you in writing why.

If we decline the report, we explain why in writing and you are free to disclose at your discretion. Common reasons we decline: the finding is in a third-party dependency we do not maintain, the finding requires an attacker who already controls the host, or the finding is a known trade-off documented in this repo.

Scope

In scope across every repository in the Vyrox organisation:

  • Authentication bypass on any service. The Rust proxy /execute and /audit/export endpoints, the Python ingestion webhooks, the Discord bot /interactions and /webhook.
  • Cross-tenant data leakage. Anything that lets a request from tenant A read, write, or signal tenant B.
  • Audit log tampering. Anything that breaks the hash chain without detection. See AUDIT_CHAIN.md for the format.
  • Containment action execution without a Discord human approval click.
  • HMAC verification weaknesses. Timing channels, prefix confusion, malleability in the canonical JSON used by Python Rust signing.
  • Replay attacks within the thirty second window on the proxy.
  • LLM prompt injection that produces a result the Pydantic validator accepts but that should have been rejected.
  • Webhook signature forgery against any of the four ingestion routes (crowdstrike, sentinelone, defender, generic).
  • Secret extraction from any in-memory or on-disk location the documentation says is unreachable.

Out of scope:

  • Findings that require physical access to a customer host.
  • Denial-of-service caused by exhausting public dependencies (rate limits on free LLM tiers, Discord rate limits, Redis quotas). These are operational concerns we degrade gracefully against, not vulnerabilities.
  • Issues in EDR vendor APIs that Vyrox calls into. Report those to the vendor.
  • Bugs in development-only tooling (the simulator, local docker setups) when used outside their documented purpose.
  • Misconfiguration findings on a customer-operated installation that ignore the documented configuration in this repo.

Security invariants

The six critical rules in ARCHITECTURE.md are the invariants we hold the code to. Any report that demonstrates a break in one of these is in scope and the patch will ship with the shortest reasonable embargo.

  1. Tenant isolation on every database query and every Redis key.
  2. Audit entry written before any state-changing response.
  3. HMAC verification on the raw bytes, before any parse, in constant time.
  4. No path from LLM output or worker logic to a containment call. Only a Discord button click reaches the proxy.
  5. DRY_RUN=true by default in the proxy. Production opts in.
  6. LLM output passes Pydantic validation before any field is read.

The threat model in THREAT_MODEL.md lists the specific attacker capabilities we defend against and the mitigations that defend against them.

What we do with the report

After triage, you can expect:

  • A private GitHub Security Advisory in the affected repository, with you tagged if you accepted credit.
  • A CVE if the issue qualifies under MITRE's rules and we are the primary CVE numbering authority for the affected component.
  • A code fix and a regression test that locks the fix in. We do not patch a finding without adding a test that would have caught it.
  • A note in the changelog of the affected repository on the day the embargo ends.

We do not pay bounties during alpha. We do publish credit on the vyrox.dev/security/credits page once the issue is public.

Coordinated disclosure timeline

day  0   report received
day  2   acknowledgement sent
day  7   triage decision (accept, decline, or follow-up questions)
day 14   patch timeline shared
day 30   embargo end (default)

The reporter can negotiate a longer embargo. We will not extend it unilaterally without explaining why.

What you can do today without filing a report

We welcome adversarial testing against the open-source components. The proxy and the simulator are designed to be run against each other in a local stack. If you find a behaviour that worries you but you are not sure whether it is a vulnerability, open a discussion in the vyrox-proxy repository and tag a maintainer. We will move the conversation to private if it turns out to be sensitive.

Maintainer contact

sec.vyrox@proton.me is monitored by the founder and one engineer. Replies come from the same address. We do not respond from personal accounts, and we do not ask reporters to contact us through DMs on social platforms.