Vyrox Security
AI security copilot for teams without a dedicated SOC. Ingests EDR alerts, triages them through a deterministic heuristics engine and an LLM fallback, and routes the verdicts that matter to a human approver in Discord. Every containment action runs through a small Rust proxy that the customer can read and audit.
What this repository is
vyrox-docs is the public engineering documentation for the Vyrox Security
platform. It carries the architecture, the API contracts, the threat model,
the audit-log specification, and the contributor guides. Sales copy,
pricing, customer rosters, and SLA contract language live elsewhere.
If you found this repo looking for the source code that touches your
endpoints, you want vyrox-proxy.
That is the Rust binary that receives signed containment instructions from
the rest of the platform and calls the EDR vendor's API. It is MIT licensed
and small enough to read in an afternoon.
What Vyrox actually does
Pipeline in five steps:
- Your EDR posts alerts to a Vyrox webhook over HTTPS. Each payload is authenticated per tenant with HMAC-SHA256 or a vendor-specific bearer token. CrowdStrike, SentinelOne, Microsoft Defender, and a customer field-mapped generic adapter are all supported today.
- Ingestion verifies the signature, normalises the vendor payload into a
single
NormalizedAlertschema, and pushes it onto a per-tenant Redis queue. - The worker pulls the alert and runs it through the heuristics engine (deterministic regex-and-weight pattern matching with Noisy OR aggregation). The result is one of CRITICAL, HIGH, MEDIUM, LOW, BENIGN plus a confidence score.
- Anything in the ambiguous confidence band goes to an LLM with a strict JSON schema response. The LLM never executes anything. It only writes verdict fields. A Pydantic validator catches malformed responses and falls back to a conservative MEDIUM verdict at 0.5 confidence.
- CRITICAL and HIGH verdicts land in the tenant's Discord channel as an
embed with Approve, Deny, and Investigate buttons. Approve generates an
ActionRequest, signs it, and sends it to the Rust proxy. The proxy verifies the signature, checks a thirty-second replay window, dedupes on request ID, writes an audit entry, then either dry-runs or calls the EDR vendor's API.
Six rules hold across the whole pipeline. They are documented in
ARCHITECTURE.md and enforced by tests.
The shortest version:
- Every database query carries
tenant_id. - Every state change writes an audit entry before the response goes back.
- HMAC verification happens before any payload is parsed.
- The LLM cannot trigger containment. Only a human button click can.
- Local development sets
DRY_RUN=trueby default so the proxy refuses to call real EDR APIs. - LLM JSON output is never passed to
exec,eval,subprocess, SQL, or file operations. Only to Pydantic-validated verdict fields.
What is public, what is not
Open-core. The execution surface that touches customer infrastructure is open. The detection intelligence and the operational configuration is not.
| Component | Repo | Visibility | Why |
|---|---|---|---|
| Rust containment proxy | vyrox-proxy | Public, MIT | Customers should be able to read the code that isolates their hosts. |
| Engineering docs | vyrox-docs (this repo) | Public | Threat model, API contracts, contributor guides. |
| Alert simulator | vyrox-simulator | Public, MIT | Lets anyone replay a signed alert against a local stack. |
| Core monorepo | vyrox | Private | Ingestion, worker, Discord bot. The pipeline shape is documented here; the implementation is not. |
| Heuristics engine | vyrox-heuristics | Private | Pattern weights, MITRE technique mapping, false-positive baselines. The detection moat. |
| Adversarial playbook | vyrox-adversarial-playbook | Private | Red-team TTPs we test against. |
| Infrastructure | vyrox-deploy | Private | Provider-specific configs and secrets. |
| Partner CRM | vyrox-design-partners | Private | GTM, contracts, prospect roster. |
If you want to contribute, you can do it against vyrox-proxy,
vyrox-simulator, or this docs repo without ever touching the private
side. The contribution guide is in CONTRIBUTING.md.
Documents in this repo
Read in this order if you are new:
QUICKSTART.mdwalks you fromgit cloneto a signed alert hitting a local proxy. About ten minutes, no production credentials required.ARCHITECTURE.mdis the system reference. Pipeline stages, multi-tenancy, audit chain, the six critical rules, the container boundary diagram, the decisions behind each component.THREAT_MODEL.mdlists the assets, the threats, the mitigations, and the things explicitly out of scope. If you are evaluating Vyrox for a regulated workload, start here.API_REFERENCE.mddocuments every public endpoint: the four ingestion webhooks, the proxy's/executeand/audit/export, request and response shapes, error codes, signing rules.AUDIT_CHAIN.mdis the wire spec for the SHA-256 hash-chained audit log. Independent verifiers can reproduce the chain from the JSONL stream alone.ADAPTERS.mdis for contributors adding a new EDR vendor. Four rules to follow, one factory method to write, one test file to copy.SECURITY.mdis the disclosure policy. Email address, PGP key, scope, SLA on triage, what we do not call a vulnerability.ROADMAP.mdis the public roadmap by capability. No revenue targets, no customer counts.CONTRIBUTING.mdandCODE_OF_CONDUCT.mdcover how to send a patch and what behaviour is expected.
Status
Alpha. The pipeline is wired end to end and runs against synthetic alerts
in CI on every push. Ten pilot integrations are the next milestone. The
two recent audits in todo.md (a private file) drove the P0 fixes and
the P0.5 follow-ups already merged. Test counts at the moment of writing
this README: 89 Python tests, 17 Rust tests, lints clean across the
workspace.
What "alpha" means in practice:
- The on-disk audit format is stable. Field names will not change without
a documented migration.
AUDIT_CHAIN.mdis the contract. - The HMAC signing format is stable. Python
signreturnssha256=<hex>and the Rust proxy strips the prefix before constant-time-comparing. - The ingestion webhook URL shape is stable. The four routes documented
in
API_REFERENCE.mdare the ones we will keep. - Anything else can move. Internal data models, the LLM provider, the worker concurrency model. We will note breaking changes in the CHANGELOG once a release tagging discipline lands.
Security contact
sec.vyrox@proton.me, PGP key at
vyrox.dev/.well-known/pgp-key.txt.
Acknowledgement within forty-eight hours. Full policy in
SECURITY.md. Please do not file vulnerabilities as
public GitHub issues.
License
vyrox-proxy and vyrox-simulator are MIT licensed.
vyrox-docs, vyrox-landing, vyrox-heuristics, vyrox-deploy, vyrox-design-partners, and the vyrox monorepo are proprietary.
Vyrox Security, Inc. — hello@vyrox.dev
Quickstart
This walks an OSS contributor from git clone to a signed alert
hitting a local proxy. About ten minutes. No customer-side
credentials. No EDR account. Nothing leaves your machine.
If you are an operator integrating a real EDR, see the design partner playbook — your company contact has the link. The public docs cover the open path only.
What you need
gitcargo(Rust 1.75+ recommended; whatever the proxy'sCargo.tomlpins is fine)bash,openssl,curl. Standard on macOS and most Linuxes.- About a hundred megabytes of disk for the Rust build cache.
You do not need Python, Node, Docker, or a Discord account.
Step 1: Clone the open components
Three repositories. Each clones into its own directory.
git clone https://github.com/vyrox-security/vyrox-proxy.git
git clone https://github.com/vyrox-security/vyrox-simulator.git
git clone https://github.com/vyrox-security/vyrox-docs.git
The docs repo is this one. The other two are MIT licensed.
Step 2: Build the proxy
cd vyrox-proxy
cargo build
First build pulls the dependency tree (about ninety crates). Future
builds are quick. The final binary is at
target/debug/vyrox-proxy.
Step 3: Run the proxy with DRY_RUN
The proxy refuses to start without a HMAC secret. Generate one for local use only; do not reuse it anywhere else.
export VYROX_HMAC_SECRET=$(openssl rand -hex 32)
export AUDIT_LOG_PATH=./local-audit
export DRY_RUN=true
export BIND_ADDR=127.0.0.1:3000
mkdir -p "$AUDIT_LOG_PATH"
./target/debug/vyrox-proxy
The proxy listens on 127.0.0.1:3000. DRY_RUN=true is the default
so even if you forget to set it, the proxy will not call any EDR API.
Check it is alive in another shell:
curl -s http://127.0.0.1:3000/health
# {"status":"ok"}
Step 4: Fire a signed execution request
The proxy accepts POST /execute with an HMAC-SHA256 signed body.
Smallest valid request:
SECRET="$VYROX_HMAC_SECRET"
TS=$(date +%s)
BODY=$(cat <<EOF
{"request_id":"$(uuidgen | tr A-Z a-z)","tenant_id":"local-test","alert_id":"alt-1","action_type":"HOST_ISOLATION","host":"workstation-01","approved_by":"local-test","approved_at":$TS}
EOF
)
SIG="sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/^.*= //')"
curl -s -X POST http://127.0.0.1:3000/execute \
-H "Content-Type: application/json" \
-H "X-Vyrox-Signature: $SIG" \
--data-binary "$BODY"
# {"status":"dry_run","dry_run":true}
The proxy verifies your signature, writes an audit entry, then
short-circuits because DRY_RUN=true. Look at the audit file:
ls local-audit/
# audit-2026-05-23.jsonl
cat local-audit/audit-*.jsonl
You will see one JSONL entry with dry_run: true, a hash, and a
previous_hash of sixty four zeros (the genesis sentinel). The
format spec is in AUDIT_CHAIN.md.
Step 5: Run the alert simulator
The simulator generates signed payloads for a Vyrox ingestion
endpoint. There is no public Vyrox ingestion service to point it at,
but you can replay against the simulator's own --dry-run mode to
see what the wire format looks like:
cd ../vyrox-simulator
./simulate.sh mimikatz --dry-run
# Prints the signed payload to stdout.
If you have a private vyrox stack running (worker plus ingestion plus the bot), point the simulator at it:
VYROX_URL=http://localhost:8001/webhook \
VYROX_HMAC_SECRET=$(cat ../vyrox/.env | grep CROWDSTRIKE_WEBHOOK_SECRET | cut -d= -f2) \
./simulate.sh mimikatz
For the open path, --dry-run is enough to see how an alert payload
looks before it hits ingestion.
Step 6: Read the docs
You now have a running proxy and a signed-payload generator. The next thing to do depends on what you came for.
- Adding an EDR adapter. Start at
ADAPTERS.md. The full contract is there. - Understanding the security model. Start at
ARCHITECTURE.md, thenTHREAT_MODEL.md. - Verifying the audit chain on your own. Start at
AUDIT_CHAIN.md. The reference verifier in Python is thirty lines. - Calling the API. Start at
API_REFERENCE.md.
Troubleshooting
401 Unauthorized
The proxy rejected your signature. Two common causes:
- The shell ate your
\nsomewhere and the body bytes you signed are not what you sent. Use--data-binary(not-d) on the curl command and quote the heredoc. - You signed with a different secret than the proxy is using. Re-run the export and the proxy boot in the same shell.
410 Gone
Your timestamp is outside the thirty second replay window. Refresh
TS=$(date +%s) and regenerate the body and signature.
Proxy refuses to start
The proxy panics on boot if VYROX_HMAC_SECRET is unset. Set it
before launch. The proxy also panics if you set one of
TLS_CERT_PATH and TLS_KEY_PATH but not the other; either set both
(for TLS) or neither (for plain HTTP behind a reverse proxy).
Audit file is empty
You probably hit 401 before any audit write. The proxy writes audit entries only after the HMAC check passes. If you see a request in the logs but no audit entry, that is the reason.
What is not in the open path
The full Vyrox stack contains four more processes: ingestion, worker,
Discord bot, and the heuristics engine. Those live in private
repositories. The pipeline shape is documented in
ARCHITECTURE.md so a reader can understand the
whole system; the implementations are not public.
A contributor adding a new EDR adapter does not need the private
side. The adapter recipe in ADAPTERS.md covers what
you write, the contracts you must respect, and the tests you must
ship. A reviewer with private access merges your PR; you do not need
the private code on disk.
Next steps
- Read
CONTRIBUTING.mdfor the patch workflow, test conventions, and reviewer expectations. - Read
ARCHITECTURE.mdfor the system overview and the six critical rules. - Read
AUDIT_CHAIN.mdif you want to write a verifier or a compliance pipeline against the audit log.
Contributing
This document tells you what we will merge, what we will not, and what the review will check.
We accept contributions to three public repositories: vyrox-proxy,
vyrox-simulator, and this docs repo. The private monorepo is not
open to outside contributors today; once a public adapter or feature
needs private-side wiring, a Vyrox maintainer takes it from there.
Before you start
If your patch is more than a hundred lines or changes a contract, open an issue or a draft PR first. Five minutes of "is this the shape you want" saves a week of "we cannot merge this because the wrong abstraction".
If you found a vulnerability, do not open a PR. See
SECURITY.md for the disclosure path.
Workflow
Standard GitHub flow on every repo:
fork ─▶ feature branch ─▶ commits ─▶ PR against main ─▶ review ─▶ squash merge
Branch names: feat/<thing>, fix/<thing>, docs/<thing>,
chore/<thing>, test/<thing>. The prefix matches the conventional
commit type below.
Commit messages: Conventional Commits.
feat(adapters): add acme webhook adapter
fix(proxy): release nonce claim on audit-write failure
docs(threat-model): document A8 worker-to-bot HMAC
Multi-line bodies are welcome. Wrap at 72.
If your PR has more than one logical change, split it. Reviewers can hold one shape in their head at a time. Two unrelated changes in one PR usually means one gets merged and the other gets nitpicked forever.
What we will merge
- Bug fixes with a regression test that would have caught the bug.
- New EDR adapters following the contract in
ADAPTERS.md, with the five required failure-mode tests. - Documentation corrections backed by source. Quote the file and the line you read.
- Test coverage on existing code, especially around the six critical
rules in
ARCHITECTURE.md. - Performance improvements with benchmarks attached. We do not merge "should be faster" without numbers.
- Refactors that reduce surface area. We do not merge refactors that add surface area; those start as design discussions.
What we will not merge
- Marketing copy presented as architecture fact. "Best in class" belongs on a landing page, not in the docs.
- Security guidance that weakens controls. If your PR makes the HMAC check optional, removes the replay window, or short-circuits the audit write, the answer is no, even if the test suite passes.
- Auto-generated docs that drift from the code. The OpenAPI spec is not the source of truth; the route handlers are.
- Code style PRs that touch hundreds of files. Run
rufforcargo fmtin your own branch; do not ship a workspace-wide reformat. - Adapter PRs that violate the four rules. See
ADAPTERS.mdfor what each rule is, in concrete terms. - Changes that introduce a hard dependency on a paid SaaS provider without an open-source alternative documented as the default.
Testing
Every PR runs the full test suite in CI. Local commands:
For vyrox-proxy:
cargo test
cargo clippy -- -D warnings
cargo fmt --check
For vyrox-simulator:
./run-tests.sh # if present, otherwise read scenarios/ and run a few
shellcheck simulate.sh scenarios/*.sh
For vyrox-docs:
markdownlint-cli2 "**/*.md" # if installed
We do not ship a private-side test harness in the public docs. If your PR touches a contract documented here, write the test against the public surface. The reviewer will run the matching private-side test before merging.
Code style
- Plain prose in docs. No em-dashes. No AI tells. Builder voice. Concrete file paths and function names where they help the reader.
- Rust:
cargo fmtdefaults,cargo clippy -- -D warningsclean. - Python (private side):
ruffdefaults,mypy --strictclean, type hints on every public function. - Shell: bash with
set -euo pipefail. POSIX-compatible flags where practical (the simulator runs on macOS and Linux).
Tests follow the production code style. A test that would not pass review for its prose does not get merged just because it is a test.
Reviewer expectations
A reviewer on a public PR will:
- Read every line of the diff. We do not LGTM blocks of code we did not read.
- Verify the test suite covers the failure mode the change was supposed to fix. A bug fix without a regression test is sent back.
- Check the cross-references. If your PR changes a contract documented here, the docs change ships in the same PR.
- Push back on scope. If the PR is doing two things, the reviewer asks you to split it.
A reviewer on a private-side PR (Vyrox staff only) does the same plus
the rule-1-through-6 checklist from ARCHITECTURE.md.
Documentation discipline
Three rules.
- Document what is, not what should be. If the code does X, the docs say X. Aspirational docs lead a new contributor to look at the code, find it disagrees, and lose trust in everything else.
- Quote the file when you make a claim. "The proxy verifies HMAC
in constant time (
hmac::verify_signatureinvyrox-proxy/src/hmac.rs:140)." A reader who wants to confirm has exactly one place to look. - Update the docs in the same PR as the code. Doc drift is a one-way ratchet. We close it on every PR or it grows forever.
Cross-references
QUICKSTART.mdfromgit cloneto a signed alert in about ten minutes.ADAPTERS.mdfor the EDR adapter contract.ARCHITECTURE.mdfor the system overview and the six critical rules.SECURITY.mdfor the disclosure process.
Code of conduct
See CODE_OF_CONDUCT.md. Short version: be
direct, be technical, be respectful. We do not have time for the
opposite.
Code of Conduct
Professional Expectations
This project documents security software. Discussions should remain technical, respectful, and evidence-based.
Expected Behavior
- Be precise and professional
- Focus on code, docs, and decisions, not people
- Provide reproducible references when making claims
Unacceptable Behavior
- Harassment, abuse, or personal attacks
- Intentionally misleading security advice
- Posting secrets or sensitive tenant information
- Spam and repeated low-effort noise
Enforcement
Maintainers may edit, lock, remove, or restrict participation that harms project quality or safety.
Security concerns: security@vyrox.security
Architecture
This document is the engineering reference for the Vyrox platform. It is written for the person who is about to read or modify the code, or who is evaluating Vyrox for a regulated workload and needs to know exactly what the system does. It does not describe what the product will become. It describes what runs in CI today.
If you are looking for setup steps, see QUICKSTART.md.
If you are looking for the threat model, see THREAT_MODEL.md.
If you want the on-disk audit format, see AUDIT_CHAIN.md.
Pipeline at a glance
EDR vendors Vyrox platform
CrowdStrike Falcon ─┐
SentinelOne ├─▶ POST /webhook/{vendor} ─▶ Ingestion (FastAPI)
Defender Graph │ HMAC or bearer auth │
Generic JSON ─┘ per-tenant secret ▼
NormalizedAlert ─▶ Redis LPUSH/RPOP
│ vyrox:alerts:{tid}
▼
Worker (asyncio)
1. Cache lookup (24h TTL by alert fingerprint)
2. Heuristics (Noisy OR, <5ms)
├─ confidence ≥ 0.75 ▶ accept
├─ confidence ≤ 0.25 ▶ BENIGN
└─ otherwise ▶ LLM
3. LLM fallback (primary + 2 fallback models)
+ Pydantic schema validation
+ per-tenant daily token budget
4. Persist (SQLite, tenant-scoped tables)
5. Notify (signed HTTP to bot)
Discord bot (FastAPI)
├─ /interactions (Ed25519 verified)
├─ /webhook (HMAC verified)
└─ approval flow ▶ Rust proxy
Rust proxy
├─ HMAC verify (constant time)
├─ replay window (±30s)
├─ nonce dedup (DashMap, 10min retention)
├─ audit append (hash-chained JSONL)
└─ EDR API call (or DRY_RUN short-circuit)
All five services are independent processes. They communicate over HTTP
and Redis only. There is no shared in-process state across services. The
SQLite database is shared between the worker and the Discord bot in the
current pilot deployment. A future Postgres migration is tracked in
todo.md (private) before tenant count reaches twenty five.
Components
| Component | Language | Process | What it owns |
|---|---|---|---|
| Ingestion | Python, FastAPI | uvicorn ingestion.main:app | Webhook auth, vendor payload normalisation, Redis enqueue |
| Worker | Python, asyncio | python -m worker.main | Triage pipeline, persistence, Discord notification |
| Discord bot | Python, FastAPI | uvicorn discord_bot.main:app | Interaction handling, approval flow, signing toward the proxy |
| Containment proxy | Rust, Axum | vyrox-proxy | HMAC verify, replay window, nonce dedup, audit, EDR API call |
| Heuristics engine | Python | imported by the worker | Pattern matching, Noisy OR aggregation |
The heuristics engine is private. The shape of its API (HeuristicsEngine.score(alert: dict) -> HeuristicResult) is documented here because callers depend on it. The pattern weights and the MITRE technique mapping are not.
Critical rules
These six rules are enforced by tests and reviewed in every PR. Violating one is a blocking issue, not a stylistic choice.
Rule 1: Tenant isolation
Every database query carries a tenant_id filter. Every Redis key is
namespaced vyrox:alerts:{tenant_id}. There is no shared bucket and no
fallback tenant.
The previous default-tenant fallback was removed on 2026-05-21 after the
first audit caught it. The replacement contract: if a payload arrives
without the vendor's tenant identifier (customer_id, accountId,
tenantId), the ingestion route returns HTTP 400 and the EDR retries.
The function is resolve_tenant_id in ingestion/main.py. It raises
MissingTenantIdentifier on a missing or empty value.
The schema invariant is checked at boot. shared/db.py:_assert_tenant_id_present
walks every table in _TENANT_SCOPED_TABLES (alerts, actions,
verdict_cache, token_usage) and refuses to start the service if any
of them is missing the tenant_id column. The check uses PRAGMA table_info, runs once at startup, and raises SchemaIntegrityError
loudly enough that the deploy fails.
Rule 2: Audit before response
Every state-changing operation writes an audit entry before the response
goes back to the caller. The audit log is append-only JSONL. Each entry
carries previous_hash (the SHA-256 of the prior entry) and hash (the
SHA-256 of previous_hash || canonical_json(entry)). The first entry of
the very first log file links to a sentinel genesis hash of sixty four
zeros.
The chain survives process restarts. AuditWriter.__init__ in
shared/audit.py reads the last hash from today's log file before
accepting the first write. The Rust proxy uses the same approach in
audit::ChainState::from_file. Both implementations agree on the wire
format. The independent specification is in AUDIT_CHAIN.md.
Audit writes are durable. _sync_write in shared/audit.py flushes and
os.fsync after every entry. The Rust side does flush followed by
sync_data. A power cut between the write and the OS writeback does
not lose entries.
Rule 3: HMAC before processing
Every webhook payload is verified before any parser touches its bytes.
The verification uses hmac.compare_digest on the Python side and the
subtle::ConstantTimeEq trait on the Rust side. Both run in time
proportional to the MAC length, not to where the first byte mismatch
appears.
The wire format on the Python side: sign(payload: str, secret: str)
returns f"sha256={hex_digest}". The Rust verifier strips the
"sha256=" prefix before comparing. The round-trip is locked by
tests/test_p0_regressions.py::test_hmac_python_sign_uses_sha256_prefix.
For requests carrying JSON bodies that travel between Vyrox services
(the worker calling the bot, the bot calling the proxy), the body is
serialised with separators=(",", ":") and sort_keys=True. Without
that pinning, Python's default json.dumps and Rust's serde_json
disagree on whitespace and key order, which produces a different MAC
on the verifier side and a silent 401.
Rule 4: No autonomous containment
The LLM cannot trigger a containment action. The heuristics engine cannot trigger a containment action. The worker cannot trigger a containment action.
The only code path that calls the Rust proxy is the Discord bot's
approval handler in discord_bot/handlers/approvals.py, which runs in
response to a Discord button click. The button click itself is
authenticated end to end: Discord signs the interaction with Ed25519,
the bot verifies the signature against the application's public key in
discord_bot/security.py, the handler then signs an ActionRequest
with the shared HMAC secret, and the proxy verifies that signature
before doing anything else.
The static invariant is enforced by a test:
tests/test_p0_regressions.py::test_worker_triage_never_invokes_proxy
greps the worker modules at import time and at source level for any
reference to discord_bot.proxy_client.execute_action. If the worker
ever imports that symbol, the test fails. The check covers both eager
imports and lazy imports inside functions.
Rule 5: DRY_RUN by default
The Rust proxy's dry_run flag is true by default. Production has to
opt in to real execution by setting DRY_RUN=false in the environment.
The check happens before the EDR client is even constructed, so
mis-configuration cannot accidentally call the vendor's API.
#![allow(unused)] fn main() { // vyrox-proxy/src/main.rs let response = if state.dry_run { info!(/* ... */, "DRY_RUN: skipping EDR call"); ExecuteResponse { status: "dry_run".to_string(), dry_run: true } } else { state.edr.dispatch(payload.action_type, &payload.host).await }; }
The audit entry written on a DRY_RUN action looks identical to a real
action except for the dry_run: true field. That is intentional. An
operator looking at the audit log can tell the difference, and a
compliance review on the JSONL stream sees the same chain integrity
either way.
Rule 6: LLM output never directly executed
The LLM returns a JSON object with five fixed fields: verdict,
confidence, reasoning, mitre_techniques, suggested_action. The
triage_with_llm function in worker/llm.py runs the parsed object
through _parse_triage_json which checks every field against a fixed
allow-list (verdict in {CRITICAL, HIGH, MEDIUM, LOW, BENIGN},
confidence clamped to [0, 1], suggested_action in the action allow-list).
A response that fails validation produces a conservative MEDIUM verdict
at 0.5 confidence, not a partial commit.
The validated object never touches exec, eval, subprocess, the
filesystem, or SQL. It only sets fields on a TriageResult. The
Pydantic model itself is frozen so even a downstream caller cannot
mutate fields after the fact.
Multi-tenancy
Tenant isolation is a property of the data layer, not a runtime check in business logic.
| Surface | How tenants are separated |
|---|---|
| Redis queue | Key namespace: vyrox:alerts:{tenant_id} |
| SQLite tables | Every row carries tenant_id; queries filter on it |
| Discord channels | DiscordGuild.tenant_id maps Discord server to tenant |
| Webhook secrets | Looked up per tenant in tenant_credentials.webhook_secret_encrypted |
| Audit log | Each entry carries tenant_id; export endpoints filter server-side |
| Token budget | Daily ledger keyed on tenant_id and date |
| Verdict cache | Cache key (tenant_id, fingerprint) |
The webhook routes resolve the tenant from the vendor payload's own
identifier field (customer_id, accountId, tenantId), then look up
that tenant's secret in tenant_credentials before verifying the
signature. A payload that authenticates with the wrong tenant's secret
fails the HMAC check and returns 401. A payload with no identifier
returns 400. There is no path where an unmatched payload lands on a
shared queue.
Cross-tenant access from inside the Discord bot is blocked by
discord_bot/main.py:312. The custom_id of every approval button
embeds the alert's tenant ID. Before calling the approval handler, the
bot checks that the alert tenant matches the Discord guild's tenant.
A mismatch returns "This action is not valid for this server" without
contacting the proxy.
Two-stage triage
Triage runs in worker/triage.py::triage. Five stages, three early
returns.
┌────────────────────────┐
NormalizedAlert ──▶ verdict cache ──▶ cache hit ──▶ │ return cached verdict │
│ └────────────────────────┘
│ cache miss
▼
heuristics engine
│
┌─────────────────────────┼─────────────────────────┐
▼ ▼ ▼
confidence ≥ 0.75 confidence ≤ 0.25 0.25 < confidence < 0.75
accept heuristic return BENIGN LLM fallback
verdict │
▼
token budget check
│
┌───────────────────────────┼───────────────────────────┐
▼ ▼ ▼
budget exhausted primary model primary 429/5xx
MEDIUM / 0.5 parse + return ▼
fallback model 1
parse + return
│
▼
fallback model 2
parse + return
│
▼
all rate limited
MEDIUM / 0.5
The two-stage design solves three problems at once. Determinism and
explainability for the eighty percent of alerts that are obvious. Low
cost because the LLM is reserved for the ambiguous middle band. A
conservative default verdict for any failure mode, so the queue never
jams on a provider outage. The LLM provider is not named in this doc
because the choice is operational. The model chain is configured in
environment variables (LLM_PRIMARY_MODEL, LLM_FALLBACK_MODEL_1,
LLM_FALLBACK_MODEL_2).
Approval flow
Discord button click
│
▼
bot /interactions ◀──── Ed25519 verify against settings.discord_public_key
│
▼
custom_id parse ──▶ approve / deny / investigate
│
▼ (approve only)
AlertRecord lookup by alert_id + tenant_id
│
▼
Idempotency check ──▶ if status already executed/executing/approved → no-op
│
▼
Mark alert "executing"
Persist ActionRecord "approved"
Audit "approve.requested" ◀──── written before any outbound call
│
▼
proxy_client.execute_action()
body signed with vyrox_hmac_secret (deterministic JSON)
│
▼
Rust proxy /execute
├─ HMAC verify
├─ replay window check (±30s)
├─ nonce.claim_or_replay(request_id)
├─ audit::append_audit ◀──── written before EDR call
└─ edr.dispatch (or DRY_RUN short-circuit)
│
▼
ActionRecord.status = "executed" or "dry_run"
Alert.status = "executed"
Audit "approve.executed"
The flow's idempotency story has three layers. The bot checks the
AlertRecord.status before generating a request ID, so a double-click
returns "already approved". The proxy keeps a per-request-ID nonce
store with ten minute retention, so a network retry replays the cached
response instead of calling the EDR twice. The audit entry is written
once per state transition; replayed requests do not double-log.
Configuration
All configuration is read at startup from environment variables through
shared/config.py::Settings. The settings class uses
pydantic_settings so a missing required field raises a
ValidationError before the service serves traffic.
The full env contract is in .env.example
in the private monorepo. The fields that an OSS contributor needs to
know about:
| Variable | Component | Purpose |
|---|---|---|
VYROX_HMAC_SECRET | all | Sixty four hex characters. Signs Python ↔ Python and Python ↔ Rust traffic. |
REDIS_URL | ingestion, worker | redis:// or rediss:// URL. The legacy Upstash REST variables are still accepted for backward compatibility but new deployments should set this. |
OPENCODE_ZEN_API_KEY | worker | LLM provider key. Empty falls back to the legacy OPENROUTER_API_KEY during the migration window. |
DISCORD_BOT_TOKEN | bot | Discord application token. |
DISCORD_PUBLIC_KEY | bot | Application public key for interaction Ed25519 verification. Empty skips verification (local dev only). |
CROWDSTRIKE_WEBHOOK_SECRET | ingestion | Vendor-default HMAC secret. Per-tenant secrets stored in tenant_credentials override this. |
SENTINELONE_WEBHOOK_SECRET | ingestion | Vendor-default bearer token. |
DEFENDER_WEBHOOK_SECRET | ingestion | Defender Graph clientState value used as bearer. |
AUDIT_LOG_PATH | all writers | Directory for daily JSONL files. The hash chain depends on this surviving restart. |
VYROX_PROXY_URL | bot | Base URL of the Rust proxy. |
DRY_RUN | proxy | true by default. Production opts in to real EDR calls. |
What is in the private side
Reading the public docs without seeing the private code is intentional. The boundary makes contribution clear.
The private monorepo holds the implementation of the pipeline above.
File names mirror the layout described here (ingestion/, worker/,
discord_bot/, shared/, playbook/, migrations/, tests/). The
Python tests covering the public contracts have public-safe names
(test_p0_regressions.py, test_p05_blockers.py). Anyone with access
can map a private fix to a public contract in seconds.
The detection patterns, the LLM prompts, and the operational configs stay private. Those are the layer that creates the business; the proxy and the contracts are the layer that creates the trust. The split is deliberate.
Operating commitments
We do not publish hard SLA percentages in this repo. The reasons are honest. Numbers we cannot defend across all pilots today belong in negotiated contracts, not in OSS docs.
What we can commit publicly:
- The audit log is customer-owned. We do not lose it, we do not modify it, and we provide export at any time. The format is the contract, not our retention policy.
- Containment proceeds only after a human in Discord clicks Approve. There is no autonomous containment path.
- Webhook authentication failures and proxy signature failures both return generic 401 responses. We never tell a caller which part of the credential was wrong.
Per-customer SLAs that involve uptime targets and triage latency live in signed contracts.
Decisions worth knowing
A short list, written for the reader who is asking "why this and not that".
Rust for the proxy. The proxy is the only Vyrox process that can
cause customer-side side effects. The set of properties we wanted in
one binary: memory safety without a garbage collector, a small static
binary, a constant-time HMAC implementation in the ecosystem, no
runtime dependency on a vendor library. The Rust choice gave us all of
them. The proxy is intentionally small. About a thousand lines of code
including tests, splitting across main, hmac, audit, nonce,
edr, and actions.
SQLite for the pilot. SQLite with WAL mode and a single writer process handles the pilot scale (ten tenants, low hundreds of alerts per day per tenant). Write contention bites somewhere around twenty five tenants, which is the trigger for the Postgres migration. The schema is already SQLModel-compatible, so the migration is a SQL dump plus a connection string change, not a rewrite.
Discord as the operator UI. The first ten pilots use Discord exclusively. The bot handles onboarding, alert review, approval, and slash commands for stats and audit export. The cost is one extra infrastructure provider; the benefit is that a customer's first five-minute experience is "I added your bot to my server and a synthetic alert appeared." A web dashboard ships when a prospect refuses Discord or when customer count reaches eleven, whichever comes first.
Two-stage triage. A pure LLM design is slow, expensive at scale, and not auditable without careful prompt engineering. A pure rules design misses anything novel. The split lets us run the heuristics for free, run the LLM only on the ambiguous middle band, fall back to a conservative MEDIUM on any failure, and keep the LLM output strictly inside a Pydantic schema before it touches anything else.
Human in the loop for execution. Auto-isolating hosts on false positives is the kind of incident that loses you the customer. Until we have a year of per-tenant false-positive data, every CRITICAL and HIGH containment is gated on a human Approve click. LOW auto-approval is opt-in per tenant and logged identically to manual approvals.
Cross-references
THREAT_MODEL.md: assets, threats, mitigations, out of scope.API_REFERENCE.md: every public endpoint with schemas and error codes.AUDIT_CHAIN.md: on-disk format spec for the hash-chained audit log.ADAPTERS.md: contributor guide for adding a new EDR vendor.QUICKSTART.md: fromgit cloneto a signed alert in about ten minutes.- Rust proxy source: https://github.com/vyrox-security/vyrox-proxy.
- Simulator: https://github.com/vyrox-security/vyrox-simulator.
Threat model
This document is the asset-by-asset, attacker-by-attacker view of the
Vyrox platform. It is the document a regulated workload's security
review will ask for, and it is the document that drives every test in
tests/test_p0_regressions.py and tests/test_p05_blockers.py.
The format is STRIDE-aligned but pragmatic. We list each asset, the attackers we consider in scope for that asset, the threats they could plausibly carry out, the mitigations that defend against those threats, and the residual risks we have accepted.
Trust boundaries
public internet private network
EDR vendors ──── webhooks ────────▶ ingestion ────▶ Redis
│
└────────▶ SQLite
▲
Discord ──── interactions ──────────▶ bot ────────────┘
│
└──── HMAC-signed ──▶ Rust proxy
│
▼
EDR vendor
APIs
Boundaries that matter:
- EDR vendor → ingestion webhook. The vendor is honest, the network
between them and us is not. Mitigations: HMAC-SHA256 or bearer
token per route, per-tenant secrets stored in
tenant_credentials. - Discord → bot
/interactions. Discord is honest, anyone with the bot URL is not. Mitigations: Ed25519 signature verification with the application public key. - Worker → bot
/webhook. The worker is honest, anyone on the same network as the bot is not. Mitigations: HMAC-SHA256 over deterministic-JSON body using the sharedVYROX_HMAC_SECRET. - Bot → Rust proxy. Same model. Mitigations: HMAC-SHA256 plus a thirty second replay window plus per-request-ID nonce dedup.
- Customer → bot slash commands. Customer-side users are not all equal. Mitigations: Discord-side RBAC via role IDs; the bot rejects approval clicks from users without the configured admin role.
Assets
A1: Customer audit log
The append-only JSONL audit log per tenant. Contains a record of every alert triaged, every Discord approval click, every proxy execution, every action result. The log is the authoritative incident-response artifact and the SOC 2 evidence sample.
| Threat | Mitigation | Residual |
|---|---|---|
| Modify a past entry to hide an executed containment | SHA-256 hash chain over the full payload. Any single-byte change breaks the chain at the modified entry and every entry after it. Operators verify with the standalone script in AUDIT_CHAIN.md. | An attacker who controls the host can truncate the log to a prior good entry. We detect truncation only on restart by comparing last_hash between processes. Tracked as "tamper detection on truncation" in the roadmap. |
| Read another tenant's audit log | Every audit entry carries tenant_id. The Rust proxy /audit/export endpoint filters server-side on tenant_id and requires an HMAC-signed timestamp header on the request. | An operator with shell access on the proxy host can read everyone's log. Out of scope; treat shell access as a P0 incident. |
| Lose entries on power loss between write and OS flush | _sync_write in shared/audit.py calls flush + os.fsync. The Rust side calls flush + sync_data. Both flush to physical storage before returning. | A disk failure between fsync and the next read can still lose the entry. Mitigate at the filesystem layer (RAID, snapshots). |
A2: HMAC shared secret (VYROX_HMAC_SECRET)
A thirty two byte secret encoded as sixty four hex characters. Signs every Python-to-Python and Python-to-Rust call.
| Threat | Mitigation | Residual |
|---|---|---|
| Recover the secret from a timing channel during HMAC compare | hmac.compare_digest in Python, subtle::ConstantTimeEq in Rust. Both run in time proportional to MAC length, not to where the first byte mismatches. Locked by tests in tests/test_crypto.py and vyrox-proxy/src/hmac.rs::tests. | An attacker who can co-locate on the same CPU might measure cache timing in theory. Not feasible against an HMAC compare in practice. |
| Leak the secret in logs or error responses | Settings module never logs the secret. HMAC failures return a generic 401 with detail "invalid signature". The Rust proxy uses tracing with the secret field marked private. | Misconfigured external log aggregator could capture an env dump. Mitigate at the deployment layer. |
| Use the same secret to forge a request after a key rotation | A rotation invalidates all signed requests immediately. The bot regenerates the request ID on every retry, so any cached old-secret payload becomes useless after the rotation. | Operator must coordinate rotation between the worker, bot, and proxy. Documented in the runbook. |
A3: Per-tenant webhook secrets
Each onboarded tenant has its own webhook secret in
tenant_credentials.webhook_secret_encrypted. The column is named
"encrypted" but stores raw bytes during the pilot. Encryption at rest
ships when the encryption module lands.
| Threat | Mitigation | Residual |
|---|---|---|
| Tenant A spoofs a payload as tenant B | The route resolves the tenant from the payload first, then looks up that tenant's secret. The signature must match the per-tenant secret, not the global one. A wrong-tenant signature fails the HMAC compare. | A misconfigured route that uses the global fallback secret for a tenant who should have their own is detectable in the audit log (the lookup logs at WARN). Tracked as I-8 in the roadmap. |
| Read another tenant's secret from the DB | All DB queries filter by tenant_id. There is no read path that returns all tenant_credentials rows. The schema preflight at startup refuses to start the service if the table is missing the tenant_id column. | A direct SQL session has access to everything. Restrict shell access. |
A4: Discord application token
DISCORD_BOT_TOKEN lets the bot post messages, fetch member rosters,
and react to interactions. A compromised token lets an attacker delete
the bot, post arbitrary messages, or impersonate the bot inside
customer servers.
| Threat | Mitigation | Residual |
|---|---|---|
| Token leaked via env dump | Token is not logged. Settings module masks the value in __repr__. Production deployments use a secret manager rather than .env files on disk. | Misconfigured CI could leak the token in a build log. Use the secret-injection feature of the CI provider, not echo. |
| Attacker uses the token to forge an interaction reply | Outbound calls to Discord use the token. Inbound interactions are verified against the application public key (Ed25519). A leaked token does not let an attacker forge interactions back to us, only push messages out as the bot. | Cannot prevent an attacker from impersonating the bot in customer servers until we detect and rotate the token. Rotation runbook ships before customer #5. |
A5: Containment proxy ability to call EDR vendors
The Rust proxy can call CrowdStrike, SentinelOne, or Defender APIs to isolate hosts, kill processes, and quarantine network access. The blast radius of a compromised proxy is the whole tenant fleet.
| Threat | Mitigation | Residual |
|---|---|---|
Forge an ActionRequest without the shared secret | HMAC verification before any parse. Constant-time compare. Replay window of thirty seconds. Nonce dedup on request_id. All four together leave the attacker no path. | Compromising the host running the proxy bypasses everything. Treat as a P0 host-level incident. |
| Trick the proxy into calling a wrong host | The proxy treats the host field as opaque and passes it through to the EDR API. The EDR vendor checks that the host belongs to the calling tenant. A wrong host either fails the EDR API or affects the same tenant. | An attacker with the right HMAC and the right tenant can isolate a host belonging to that tenant. They have already passed authentication; this is not a privilege escalation. |
| Re-execute a containment via the replay window | Nonce store records every request_id for ten minutes. A replayed request with the same ID returns the cached response and never calls the EDR. A replayed request with a different ID fails the thirty second timestamp check. | An attacker who controls the wire could ship a fresh request_id within thirty seconds, but they would also need to ship a valid signature. They cannot do that without the secret. |
DRY_RUN=false in development by accident | DRY_RUN=true is the default. Production opts in. The bool parser accepts the common spellings (true, 1, yes, on) and warns on anything else. | An operator who explicitly turns off DRY_RUN can still cause real EDR calls. Documented. The expected production setting is DRY_RUN=false plus a vendor API token; the absence of the token also short-circuits the call. |
A6: LLM provider trust boundary
We send process command lines, hostnames, and user account names to a third-party LLM router. The router is a trust boundary we do not control. Some customers will require an opt-out.
| Threat | Mitigation | Residual |
|---|---|---|
| LLM provider logs payloads and is compromised | Tenants can opt out of LLM triage at the contract level. With LLM disabled, the worker returns MEDIUM/0.5 from worker.llm._conservative_fallback on the ambiguous middle band. Heuristics still handle the high and low confidence ends. | Cannot prevent the provider from seeing data when LLM is enabled. Documented in the pilot agreement. |
| LLM prompt-injection attack from inside a malicious file path | The prompt is a fixed template; vendor data only appears in the value slots. The response goes through Pydantic validation before any field is used. A response that does not match the schema falls back to MEDIUM/0.5 and writes a llm_call_parse_error audit event. | A model that returns a perfectly-formed but wrong verdict still passes validation. Mitigate with heuristics overrides for known false-positive patterns. |
A7: Discord interaction endpoint
The bot's /interactions route is publicly reachable on the internet
because that is the contract with Discord. Anyone who finds the URL
without the Ed25519 public key can attempt to forge interactions.
| Threat | Mitigation | Residual |
|---|---|---|
Forge an approve_<alert_id>_<tenant_id> button click | Every interaction POST carries X-Signature-Ed25519 and X-Signature-Timestamp. The bot verifies the Ed25519 signature against settings.discord_public_key before any handler runs. A bad signature returns 401. Locked by tests/test_p05_blockers.py::test_discord_signature_*. | The verifier is bypassed when DISCORD_PUBLIC_KEY is empty. Local dev only; production refuses to set up Discord without the key. |
| Replay a captured legitimate interaction | Discord publishes guidance against this and uses a short timestamp window. The Vyrox approval handler also checks AlertRecord.status and ignores clicks on alerts that are already executed, executing, or approved. A replayed click on an alert that already fired is a no-op. | A replay of a click on a brand-new alert within a few seconds is theoretically possible if Discord's window is open. The cost is one duplicate audit entry of approve.duplicate_click_ignored, not a double execution. |
A8: Worker /webhook from bot
The bot calls the worker's notification surface only when the worker
calls the bot first, but the bot also receives worker notifications.
The /webhook on the bot was unauthenticated until 2026-05-23 (Fix B
in the audit). It is now HMAC-protected.
| Threat | Mitigation | Residual |
|---|---|---|
| Anyone with the bot URL posts a fake alert embed into a tenant channel | Worker signs the body with VYROX_HMAC_SECRET. Bot verifies before parsing. Locked by tests/test_discord_bot.py::test_webhook_post_rejects_unsigned. | Compromise of VYROX_HMAC_SECRET lets an attacker do this; same blast radius as the proxy compromise above. |
Out of scope
We do not consider the following in this threat model. They are real risks; they are out of scope because they live above or below our control surface.
- A malicious EDR vendor sending fabricated alerts. We trust the EDR vendor as the source of truth on what events happened on the customer's hosts.
- A malicious tenant with the correct credentials self-isolating their own hosts. They authenticated; that is the contract.
- Physical attacks on the deployment host, the customer's endpoints, or our developer workstations.
- Side-channels at the silicon level. Spectre-class attacks, rowhammer, power analysis. Out of scope for a security tool that runs on top of Linux on commodity cloud hardware.
- Discord platform availability. We design for the platform being up; the backup notification path (email plus PagerDuty for CRITICAL alerts) handles platform outages.
What changed and when
The threat model is versioned implicitly through the commit history of this file. Material changes:
- 2026-05-21: First end-to-end audit. Drove the original eight P0
blockers in
todo.md(private), all of which shipped. - 2026-05-23: Second audit. Drove eight P0.5 blockers: Discord Ed25519,
bot webhook HMAC, proxy audit-export auth, Rust audit chain, Python
audit chain across boots, real
/onboardflow, env example sync, Redis configuration. All shipped.
The audits themselves are private. The fixes are public and the tests
that lock them are documented in ARCHITECTURE.md.
Audit chain specification
This document is the wire-level specification for the Vyrox audit log format. It is targeted at customers who want to verify their own log files independently, compliance teams reviewing SOC 2 evidence samples, and contributors writing new code that reads or writes audit entries.
The format is identical between the Python side (shared/audit.py in
the private monorepo) and the Rust side (vyrox-proxy/src/audit.rs,
public). The two implementations agree byte for byte. A single
verifier program can read both streams.
File layout
One JSONL file per UTC day. File name: audit-YYYY-MM-DD.jsonl. Files
are append-only on disk; the kernel honours the O_APPEND flag so
concurrent writers cannot stomp each other.
A new file rolls over at the next UTC day. The hash chain continues
across files. The first entry of a new day's file uses the hash of
the last entry of the previous day's file as its previous_hash. The
very first entry of the very first file uses the genesis sentinel
hash (sixty four ASCII zeros).
audit-2026-05-22.jsonl
audit-2026-05-23.jsonl <- previous_hash of entry 0 == hash of last entry in 2026-05-22 file
audit-2026-05-24.jsonl <- chain continues
Entry shape
Every entry is a single JSON object on its own line. Field order on
disk varies because we use serde_json::to_string (Rust) and
json.dumps(..., sort_keys=True) (Python); verifiers must not depend
on a specific order in the on-disk JSON. The hash computation, by
contrast, is order-dependent and uses canonical JSON. See
"Hash computation" below.
Rust proxy entries (containment actions)
{
"timestamp": 1700000000,
"tenant_id": "acme-corp",
"action_type": "HOST_ISOLATION",
"host": "workstation-01",
"approved_by": "jane.smith#1234",
"dry_run": false,
"previous_hash": "0000000000000000000000000000000000000000000000000000000000000000",
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
| Field | Type | Notes |
|---|---|---|
timestamp | integer | Unix epoch seconds, UTC. Capture time on the writer host. |
tenant_id | string | Multi-tenant scope. Required. |
action_type | string | One of HOST_ISOLATION, PROCESS_KILL, NETWORK_QUARANTINE. Stored as Debug format of the Rust enum. |
host | string | Vendor-side host identifier. Opaque to the audit log. |
approved_by | string | Discord username including discriminator. |
dry_run | bool | true when DRY_RUN was active and no real EDR call was made. |
previous_hash | string | 64 lowercase hex characters. Genesis sentinel for the first entry of the very first file. |
hash | string | 64 lowercase hex characters. SHA-256 of `previous_hash |
Python pipeline entries (everything else)
Python writes audit entries for ingestion events, triage decisions,
notification attempts, Discord interactions, and any other state
change. The wrapper shape is fixed; the inner entry dict is
free-form per event.
{
"timestamp": "2026-05-23T14:32:00+00:00",
"entry": {
"event": "triage_persisted",
"alert_id": "alt_abc123",
"tenant_id": "acme-corp",
"verdict": "CRITICAL",
"confidence": 0.92
},
"previous_hash": "0000000000000000000000000000000000000000000000000000000000000000",
"hash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
| Field | Type | Notes |
|---|---|---|
timestamp | string | ISO 8601 UTC. Format produced by Python's datetime.now(timezone.utc).isoformat(). |
entry | object | Free-form event payload. Conventions are documented per event below. |
previous_hash | string | Same as Rust. |
hash | string | Same as Rust. |
The Python and Rust streams interleave at the JSONL layer; they share
a single chain. A verifier reads one stream of lines, ignores whether
the inner shape is the Rust action format or the Python wrapped
format, and computes the next expected hash from the on-disk
previous_hash plus the rest of the entry.
Hash computation
The chain is a SHA-256 hash chain over canonical-JSON entries.
For Rust entries the canonical payload is the entry without the hash
field. The order is alphabetical by key. Whitespace is absent. The
canonical form for the example above is:
{"action_type":"HOST_ISOLATION","approved_by":"jane.smith#1234","dry_run":false,"host":"workstation-01","previous_hash":"0000...0000","tenant_id":"acme-corp","timestamp":1700000000}
The hash is:
hash = SHA-256( previous_hash_bytes || "|" || canonical_payload_bytes )
The separator | is one literal pipe character. It exists so a single
SHA-256 round covers the linkage and the payload without any chance
of length-extension confusion.
For Python entries the canonical payload is the wrapper object with
sort_keys=True. The reference implementation in shared/audit.py
uses json.dumps(entry, sort_keys=True) directly:
entry_str = json.dumps(entry, sort_keys=True)
new_hash = hashlib.sha256(f"{self._last_hash}{entry_str}".encode()).hexdigest()
Note that the Python and Rust hash inputs differ in two details that verifiers must respect:
- The Rust side uses
|as a separator betweenprevious_hashand the canonical payload. The Python side does not. - The Rust canonical payload excludes
hash. The Python canonical payload is the wrapper object excludinghash, but the wrapper contains a nestedentrywhose order Python preserves as-is whensort_keys=Truewalks it recursively.
We are aware the two formats are not byte-identical at the hash-input
layer. The on-disk wire format (the JSONL itself) is interleaved-safe
because the verifier dispatches on the presence of the entry field.
A future v2 of the format will unify the hash input. Until then,
either parse rule recomputes the chain from the file alone; an
external verifier can use the same dispatch logic.
Genesis hash
0000000000000000000000000000000000000000000000000000000000000000
Sixty four ASCII zeros. Used as the previous_hash of the first entry
in a brand new audit directory. The Python side defines it as
AuditWriter._GENESIS_HASH. The Rust side defines it as
audit::GENESIS_HASH.
Verifying a chain (Python reference)
A complete verifier in about thirty lines. Reads a directory of
audit-YYYY-MM-DD.jsonl files in date order, walks every entry, and
recomputes the hash. Returns the first entry where the recomputed
hash does not match the stored hash, or None if the whole chain is
intact.
#!/usr/bin/env python3
"""Audit chain verifier — reads vyrox audit log directory, checks chain."""
import hashlib
import json
import sys
from pathlib import Path
GENESIS = "0" * 64
def recompute(prev_hash: str, entry: dict) -> str:
# Dispatch on shape: Rust action entry vs Python wrapped entry.
if "action_type" in entry and "entry" not in entry:
payload = {k: v for k, v in entry.items() if k != "hash"}
canonical = json.dumps(payload, separators=(",", ":"), sort_keys=True)
h = hashlib.sha256()
h.update(prev_hash.encode("utf-8"))
h.update(b"|")
h.update(canonical.encode("utf-8"))
return h.hexdigest()
payload = {k: v for k, v in entry.items() if k != "hash"}
return hashlib.sha256(
f"{prev_hash}{json.dumps(payload['entry'], sort_keys=True)}".encode("utf-8")
).hexdigest()
def verify(audit_dir: Path) -> tuple[int, str] | None:
prev = GENESIS
line_no = 0
for f in sorted(audit_dir.glob("audit-*.jsonl")):
for raw in f.read_text().splitlines():
if not raw.strip():
continue
line_no += 1
entry = json.loads(raw)
if entry["previous_hash"] != prev:
return line_no, f"previous_hash mismatch in {f.name}"
expected = recompute(prev, entry)
if expected != entry["hash"]:
return line_no, f"hash mismatch in {f.name}: expected {expected}, got {entry['hash']}"
prev = entry["hash"]
return None
if __name__ == "__main__":
bad = verify(Path(sys.argv[1]))
if bad:
print(f"FAIL line {bad[0]}: {bad[1]}")
sys.exit(1)
print(f"OK ({line_no} entries)")
Save as verify_audit.py, run with python verify_audit.py /path/to/audit-dir.
The verifier exits non-zero on the first mismatch and prints the file and the byte cause. Customers running their own compliance pipeline should run this from CI nightly against the previous day's audit directory.
Chain continuity across restarts
The chain survives process restart. On boot:
- Python:
AuditWriter.__init__calls_sync_read_last_hashagainst today's log file. If the file exists, it reads the last line, parses it as JSON, and uses thehashvalue as the seed. If the file is missing, empty, or unparseable, the seed is the genesis sentinel. - Rust:
audit::ChainState::from_filedoes the same. It callsread_audit_logs(which silently skips malformed lines) and uses thehashof the last well-formed entry as the seed.
The continuity is enforced by tests in both implementations:
- Python:
tests/test_p05_blockers.py::test_audit_chain_survives_process_restart - Rust:
vyrox-proxy/src/audit.rs::tests::chain_survives_restart
A break in continuity (an entry whose previous_hash does not match
the previous entry's hash) is detectable by the verifier above.
There is no path in the production code that writes an entry whose
previous_hash is not the last in-memory hash.
Tamper detection in practice
A single byte modification anywhere in an entry breaks the chain at that entry and at every entry after it. The verifier reports the first break by line number. The original entry stays on disk; only the chain pointer breaks.
Truncation (deleting trailing entries from a file) is not detectable
by the chain alone. The hash chain only proves that the entries you
have are linked. It does not prove that there are no missing entries
at the end. Mitigation: customers run the verifier nightly and store
the last-seen hash from the previous run; a missing tail entry
surfaces as a chain that ends earlier than the previous nightly run
recorded.
Truncation across the very last in-memory hash (a writer that died
mid-write) is detectable on restart. The writer's __init__ reads
the file from disk; if the on-disk last_hash is older than the
last in-memory value before the crash, the restart resumes from the
on-disk value and any post-crash writes link from there. The lost
window is bounded by the writer's flush interval; both implementations
fsync after every entry.
Durability properties
- Append-only on disk. Both implementations open with the
O_APPENDflag. Concurrent writers serialise at the kernel level. - Fsync after every entry. Python uses
os.fsync(fileno). Rust usestokio::fs::File::sync_data. A power loss between write and OS flush does not lose the entry. - No buffering above the OS layer. Neither implementation holds pending entries in user-space memory after the write returns.
File rotation and retention
The platform does not rotate or delete audit files. Files accumulate
in the configured AUDIT_LOG_PATH directory forever. Customers are
free to copy files to long-term storage; the chain stays intact as
long as the copy preserves byte content.
If you want to compress old files for storage, use a streaming codec that preserves the original byte stream (gzip is fine). Decompressing the file back to the original bytes and running the verifier produces the same result as verifying the live file.
Field stability
The on-disk format is part of the public API. Adding new fields to the entry is non-breaking as long as verifiers ignore unknown fields. Renaming or removing fields is breaking.
Tracked future changes (none committed):
- Unify the Rust and Python canonical-payload computation so a single verifier function covers both shapes without dispatch.
- Add a
schema_versionfield so verifiers can short-circuit on a known-incompatible chain.
Both will be announced in CHANGELOG.md at least thirty days before
they ship.
Cross-references
ARCHITECTURE.mdfor why every state change writes an audit entry.THREAT_MODEL.mdfor the threat model on the audit log itself.API_REFERENCE.mdfor the proxy's audit-export endpoint.
API reference
Every public HTTP surface exposed by the Vyrox platform. There are seven endpoints across three services:
| Service | Method | Path | Auth |
|---|---|---|---|
| Ingestion | POST | /webhook/crowdstrike | HMAC-SHA256 over body |
| Ingestion | POST | /webhook/sentinelone | Bearer token |
| Ingestion | POST | /webhook/defender | Bearer token (Microsoft clientState) |
| Ingestion | POST | /webhook/generic/{tenant_id} | HMAC-SHA256 over body |
| Ingestion | GET | /health | none |
| Rust proxy | POST | /execute | HMAC-SHA256 over body |
| Rust proxy | GET | /audit/export?tenant_id={id} | HMAC over tenant_id:timestamp |
| Rust proxy | GET | /health | none |
The Discord bot exposes /interactions and /webhook. Those are not
documented here because they speak the Discord protocol (Ed25519) or
are internal-only (worker to bot, HMAC signed). If you need to call
into the bot, you are inside the Vyrox monorepo and there is no public
contract.
Authentication primitives
HMAC-SHA256 over a request body
Used by the CrowdStrike webhook, the Generic webhook, and the proxy
/execute. The signing function in shared/crypto.py::sign(payload, secret) returns f"sha256={hex_digest}". The verifier on the
receiving side strips the sha256= prefix and compares against its
own computed digest with hmac.compare_digest (Python) or
subtle::ConstantTimeEq (Rust).
Two rules for any caller. Sign the raw bytes you put on the wire. If your body is JSON, pin the encoding so the byte stream is deterministic:
import json
from shared.crypto import sign
body = json.dumps(payload, separators=(",", ":"), sort_keys=True)
signature = sign(body, secret) # "sha256=..."
The separators and sort_keys parameters matter. Without them,
Python and Rust will serialise the same dictionary into different byte
streams and the signature will mismatch even when the value is
identical.
Bearer token
Used by the SentinelOne and Defender webhooks. The header is
Authorization: Bearer <secret> and the receiver constant-time
compares with hmac.compare_digest. The secret is per tenant; resolution
happens after an "untrusted preview parse" of the body, identical to
the HMAC routes (see _resolve_tenant_webhook_secret in
ingestion/main.py).
Replay window
The Rust proxy and the audit-export endpoint both enforce a thirty second replay window. The timestamp is part of the signed message. Requests older than thirty seconds, or more than thirty seconds in the future, return HTTP 410.
The window is symmetric on purpose. A client whose clock is ahead of ours by minutes cannot pre-sign requests for later use.
Common patterns
Per-tenant authentication
Every ingestion route resolves the tenant before verifying the signature. The flow is the same regardless of vendor:
- Read the raw body bytes.
- Parse the body as JSON. This parse is untrusted; the result is used
only to extract the vendor's tenant identifier (
customer_id,accountId,tenantId). - Look up the per-tenant secret in
tenant_credentials. Fall back to the environment-configured default secret for that vendor if the tenant has not been onboarded yet. - Verify the signature or bearer token against that secret.
- Only after verification succeeds, promote the parsed body to the trusted payload and run the adapter.
A payload with no tenant identifier returns HTTP 400 with
{"detail": "missing tenant identifier"}. There is no default-tenant
fallback. This was the SEV-1 risk removed on 2026-05-21.
Error envelope
Every endpoint returns a JSON body on error. The shape is consistent:
{ "detail": "<short, generic message>" }
We do not leak which part of the credential was wrong, which field was missing, or what the expected signature would have been. Errors are the same for every failure of the same class. Use the audit log if you need to debug an authentication failure; the log records the request correlation ID, the resolved tenant, and the failure kind.
Status codes
| Code | Meaning |
|---|---|
| 200 OK | Used only by /health and the proxy /audit/export. |
| 202 Accepted | Webhook payload was authenticated and queued for triage. |
| 400 Bad Request | Missing tenant identifier on a webhook payload. |
| 401 Unauthorized | Authentication failed. Generic message; no specifics. |
| 410 Gone | Timestamp outside the thirty second replay window. |
| 422 Unprocessable Entity | Authenticated payload could not be normalised. |
| 503 Service Unavailable | Redis is unreachable. Retry-After: 5 header set. |
Ingestion endpoints
The ingestion service runs on port 8001 by default. All four webhook
routes return HTTP 202 with {"status": "queued", "alert_id": "<uuid>"} on success.
POST /webhook/crowdstrike
Receives CrowdStrike Falcon detection events.
Headers:
Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>
The signature is computed over the raw body using the tenant's HMAC
secret. The tenant identifier is the customer_id field on the
payload.
Body shape (minimal):
{
"detect_id": "evt:1234567890:abc123",
"customer_id": "acme-corp",
"timestamp": 1704067200,
"severity": "high",
"tactic": "TA0004",
"technique": "T1059",
"sensor": {
"hostname": "workstation-01",
"agent_id": "12345678-1234-1234-1234-123456789abc"
},
"process": {
"file_name": "cmd.exe",
"command_line": "powershell -enc JABjAGwA...",
"user_name": "CORP\\jsmith",
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}
Fields the normaliser reads (ingestion/models.py::_from_crowdstrike):
| Field | Required | Notes |
|---|---|---|
detect_id | yes | Vendor side identifier, stored as raw_id for dedup. |
customer_id | yes | Resolves the tenant. A missing or empty value returns HTTP 400. |
timestamp | no | Defaults to time.time() at receive time. |
severity | no | Uppercased and stored as vendor_severity. |
tactic | no | MITRE tactic name. |
technique | no | MITRE technique ID. |
sensor.hostname | no | The affected endpoint. |
process.file_name | no | The executable name. |
process.command_line | no | Full command line. Triage values this heavily. |
process.user_name | no | User context. Domain format like CORP\\jsmith is fine. |
process.sha256 | no | File hash. |
POST /webhook/sentinelone
Receives SentinelOne threat events.
Headers:
Content-Type: application/json
Authorization: Bearer <tenant_secret>
The bearer token is constant-time compared against the per-tenant
secret. The tenant identifier is accountId on the body.
Body shape (minimal):
{
"id": "thrt_1234567890abc",
"accountId": "acme-corp",
"createdAt": 1704067200,
"severity": "high",
"mitreTactic": "TA0004",
"mitreTechnique": "T1059",
"agentRealtimeInfo": {
"computerName": "workstation-01",
"agentId": "1234567890abc"
},
"fileName": "powershell.exe",
"commandLine": "powershell -enc JABjAGwA...",
"fileContentHash": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
POST /webhook/defender
Receives Microsoft Defender for Endpoint alerts via the Microsoft Graph Security API webhook subscription.
Headers:
Content-Type: application/json
Authorization: Bearer <clientState>
The bearer value is the clientState you chose at subscription time.
The tenant identifier is tenantId on the body (the Azure AD tenant
the alert came from).
Body shape (alertV2 subset that the normaliser reads):
{
"id": "abc123",
"tenantId": "11111111-2222-3333-4444-555555555555",
"createdDateTime": "2026-05-23T14:32:00Z",
"severity": "high",
"category": "CredentialAccess",
"mitreTechniques": ["T1003"],
"evidence": [
{
"deviceDnsName": "workstation-01.acme.local",
"userAccount": {
"userPrincipalName": "jsmith@acme.com"
},
"imageFile": {
"fileName": "lsass-dumper.exe"
},
"processCommandLine": "lsass-dumper.exe -o creds.dmp",
"fileDetails": {
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
}
]
}
The Defender evidence array is heterogeneous. The normaliser
(ingestion/models.py::_from_defender) walks the array and pulls the
first instance of each evidence kind it recognises. Microsoft can and
does add new evidence kinds; that does not break normalisation, the
adapter just ignores what it has not seen before.
POST /webhook/generic/{tenant_id}
The catch-all webhook for any EDR that can POST JSON but is not on the natively-supported list. The tenant identifier comes from the URL path because the customer's payload shape is not known in advance.
Headers:
Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>
The customer also supplies a field map at onboarding time. The map
tells the adapter how to find each NormalizedAlert field on their
payload, using dotted-path notation:
{
"raw_id": "event.id",
"hostname": "device.name",
"username": "actor.upn",
"process_name": "process.exe",
"process_cmdline": "process.cli",
"sha256": "file.hash",
"vendor_severity": "metadata.severity",
"tactic": "mitre.tactic",
"technique": "mitre.technique",
"timestamp": "event.ts"
}
Required keys in the field map: raw_id, hostname, vendor_severity.
A missing required key returns HTTP 422.
GET /health
Returns {"status": "ok"} when the service is up and Redis is
reachable. Returns 503 with Retry-After: 5 otherwise.
Containment proxy endpoints
The Rust proxy runs on port 3000 by default. It accepts two non-health requests and refuses everything else.
POST /execute
Executes a human-approved containment action.
Headers:
Content-Type: application/json
X-Vyrox-Signature: sha256=<hex_digest>
Body:
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"tenant_id": "acme-corp",
"alert_id": "alt_abc123",
"action_type": "HOST_ISOLATION",
"host": "workstation-01",
"approved_by": "jane.smith#1234",
"approved_at": 1704067200
}
| Field | Type | Notes |
|---|---|---|
request_id | string | UUID-v4. Idempotency key. Same ID returns the cached response. |
tenant_id | string | Multi-tenant scope. Carried into every audit entry. |
alert_id | string | The alert that triggered the action. |
action_type | enum | HOST_ISOLATION, PROCESS_KILL, or NETWORK_QUARANTINE. |
host | string | Vendor-specific host identifier. CrowdStrike uses device IDs. |
approved_by | string | Discord username that clicked Approve. |
approved_at | int | Unix epoch seconds. Must fall in the replay window. |
Responses:
{ "status": "executed", "dry_run": false }
{ "status": "dry_run", "dry_run": true }
{ "status": "replayed", "dry_run": false }
| Status | Meaning |
|---|---|
executed | The EDR vendor returned success. |
dry_run | DRY_RUN=true was in effect; the EDR API was not called. |
replayed | The same request_id was already executed; the cached response is returned without calling the EDR again. |
Error codes:
| Code | Cause |
|---|---|
| 400 | request_id empty or body fails to parse after HMAC succeeds. |
| 401 | HMAC verification failed, or X-Vyrox-Signature header missing. |
| 409 | Same request_id still in flight from a prior call. |
| 410 | approved_at outside the thirty second replay window. |
| 500 | Internal failure, including audit write failure. The nonce claim is released so a retry can succeed. |
| 502 | EDR vendor API returned an error. The nonce claim is released. |
GET /audit/export?tenant_id={id}
Returns every audit entry for the requested tenant. The entries are returned as JSON in the response body; for streaming exports on large logs, see the roadmap.
Headers:
X-Vyrox-Signature: sha256=<hex_digest>
X-Vyrox-Timestamp: 1704067200
The signature is HMAC-SHA256 of the canonical message
"<tenant_id>:<timestamp>" using the shared HMAC secret. The
timestamp must fall in the thirty second replay window. Without both
headers, the response is 401.
Response:
[
{
"timestamp": 1704067200,
"tenant_id": "acme-corp",
"action_type": "HOST_ISOLATION",
"host": "workstation-01",
"approved_by": "jane.smith#1234",
"dry_run": false,
"previous_hash": "0000...0000",
"hash": "e3b0c4..."
}
]
Every entry carries a previous_hash and a hash so an external
verifier can reproduce the chain. The format spec and a reference
verifier are in AUDIT_CHAIN.md.
GET /health
Returns {"status": "ok"} when the proxy is up. The health endpoint
has no dependencies; it returns 200 even when EDR vendors are
unreachable.
Rate limiting
The ingestion service has no rate limit at the HTTP layer. EDR vendors themselves rate-limit their webhook deliveries. If you need to slow the worker down, throttle at the queue layer.
The Rust proxy has no per-route rate limit either. The nonce store
dedups by request_id for ten minutes, which is the effective limit
for repeated requests with the same ID. A burst of unique requests
hits the EDR vendor's own rate limit, which the proxy surfaces as a
502.
A per-tenant rate limit on the proxy is on the roadmap. The driver is operational: a misconfigured automation that fires a hundred Approve clicks in a second should not turn into a hundred EDR API calls.
Versioning
API contracts in this document are stable for the alpha. Breaking
changes will be announced in CHANGELOG.md of the relevant repo at
least thirty days before they ship. New endpoints can be added
without notice. New optional fields on existing endpoints can be added
without notice. Removing or renaming a field is breaking.
The audit log format is versioned separately. See
AUDIT_CHAIN.md.
Testing your integration
Use the simulator. It is in vyrox-simulator
and runs entirely in bash with openssl and curl. Replays a signed
mimikatz alert against a local ingestion service in under five seconds:
git clone https://github.com/vyrox-security/vyrox-simulator
cd vyrox-simulator
VYROX_URL=http://localhost:8001/webhook \
VYROX_HMAC_SECRET=$(cat your-test-secret) \
./simulate.sh mimikatz
--dry-run prints the signed payload to stdout without making the
HTTP call. Useful for debugging signature mismatches.
EDR adapter contributor guide
This document is for a contributor who wants to add a new EDR vendor to the Vyrox ingestion pipeline. The current set is CrowdStrike Falcon, SentinelOne, Microsoft Defender for Endpoint, and a customer-mapped generic JSON webhook. The fifth one might be yours.
What an adapter is
A Vyrox adapter is the code that turns one specific EDR vendor's
webhook payload into a NormalizedAlert. The triage pipeline
downstream of ingestion only sees NormalizedAlert. It does not
care which vendor the alert came from. Adding a new vendor is
mechanical: write one factory method, one route, one test file,
update one README, done.
The contract between the adapter and the rest of the platform is the four rules in the next section. The rules are not stylistic; they are how the security model holds. Every existing adapter follows them. Every new adapter must.
The four rules
These exist in the private monorepo at
vyrox/ingestion/adapters/README.md. They are reproduced here so
contributors do not need access to the private side to know what to
build.
Rule 1: Authentication before parsing
The route MUST verify the request's authentication before running
json.loads() on the body. Parsing untrusted bytes is a class of
attack we do not need to be exposed to.
The accepted pattern, in pseudocode:
body = await request.body() # 1. raw bytes
preview = json.loads(body) # 2. untrusted parse, only to find tenant_id
tenant_id = resolve_tenant_id(vendor, preview) # 3. raises if missing
secret = resolve_tenant_secret(tenant_id, vendor) # 4. per-tenant
verify(body, signature, secret) # 5. authenticate on raw bytes
payload = preview # 6. now trusted
alert = NormalizedAlert._from_<vendor>(payload, tenant_id)
Step 2 is the only place where an unauthenticated parse is allowed, and its result is used for one thing only: finding the tenant_id field on the payload. If the per-tenant secret lookup fails or the signature comparison fails, the request returns 401 before any business logic touches the parsed dict.
Rule 2: tenant_id from authenticated context
The tenant_id that goes onto the NormalizedAlert MUST come from
a source the signature actually authenticates. Two acceptable
patterns:
- The tenant identifier is part of the signed body. CrowdStrike
(
customer_id), SentinelOne (accountId), and Defender (tenantId) all work this way. The preview-parse trick is safe because the per-tenant secret is keyed on the identifier from the preview, and the signature compare uses that secret. A wrong tenant either produces no secret lookup hit or fails the signature check. - The tenant identifier is part of the URL path. The generic adapter works this way. The URL itself is not signed, but the per-tenant secret is keyed on the path tenant_id, so a mismatched path resolves to the wrong secret and the HMAC compare fails.
What is NOT acceptable: trusting an unauthenticated header like
X-Tenant-Id, relying on a query string parameter, or falling back
to a shared default tenant when the identifier is missing. The
MissingTenantIdentifier exception in the private
ingestion/main.py exists for exactly this case. Missing identifier
returns HTTP 400, never a silent route to a shared bucket.
Rule 3: Audit entry before HTTP 202
Every accepted alert MUST land in the audit JSONL chain before the
ingestion handler returns 202 to the EDR vendor. The order matters.
If the process crashes between the enqueue and the audit write, we
prefer the audit to be missing rather than the alert. The current
implementation writes the audit hop inside queue.enqueue for that
reason.
If your adapter calls a non-default code path that bypasses
queue.enqueue, write the audit entry manually before the route
returns. The pattern in shared/audit.py::AuditWriter.write takes a
dict; the conventional event name is ingest.accepted with at
minimum tenant_id, source (vendor name), and raw_id (the
vendor's own alert ID).
Rule 4: Output is a valid NormalizedAlert
The only thing the rest of the pipeline sees is NormalizedAlert.
Your adapter MUST produce one. Three constraints:
sourceis a unique vendor string. Lowercase, no spaces. Choose one that does not collide with the existing four (crowdstrike,sentinelone,defender,generic).tenant_idis populated from the authenticated context (rule 2).idis a fresh internal UUID. Do not reuse the vendor's identifier. Store the vendor's ID inraw_idinstead. The two are not the same:raw_idis for vendor-side dedup;idis the Vyrox-internal identifier referenced by audit entries and Discord buttons.
Missing optional fields default to None or empty string. Never to a
placeholder like "unknown" — the triage engine treats None and
"unknown" differently.
What NormalizedAlert looks like
@dataclass
class NormalizedAlert:
tenant_id: str
id: str # internal UUID, auto-generated
source: str # "crowdstrike", "sentinelone", ...
raw_id: str # vendor's own alert ID, used for dedup
timestamp: int # unix epoch seconds
hostname: str # affected endpoint
username: str | None # optional
process_name: str | None
process_cmdline: str | None
sha256: str | None
tactic: str | None # MITRE tactic name
technique: str | None # MITRE technique ID
vendor_severity: str # INFORMATIONAL | LOW | MEDIUM | HIGH | CRITICAL
The dataclass is intentionally flat. Nested vendor structures
(CrowdStrike's sensor, SentinelOne's agentRealtimeInfo, Defender's
evidence array) are flattened during normalisation. Triage code
reads top-level fields only.
vendor_severity is the vendor's own assessment, not Vyrox's. The
triage pipeline produces its own verdict afterwards.
Adding a new vendor in six steps
The example below sketches an adapter for a hypothetical "Acme EDR" vendor that posts alerts to a webhook with a bearer token.
Step 1: Add a factory method on NormalizedAlert
In the private monorepo, in vyrox/ingestion/models.py, add a
classmethod that takes the vendor payload and a tenant_id and returns
a populated NormalizedAlert.
@classmethod
def _from_acme(cls, payload: dict[str, Any], tenant_id: str) -> "NormalizedAlert":
"""
Parse an Acme EDR alert payload into a NormalizedAlert.
Acme posts a flat JSON with a top-level `alert_uuid`, a nested
`endpoint` block, and a nested `actor` block. The schema is the
one documented at <Acme docs URL> retrieved on <date>.
"""
return cls(
tenant_id=tenant_id,
source="acme",
raw_id=str(payload.get("alert_uuid", "")),
timestamp=int(payload.get("ts", time.time())),
hostname=payload.get("endpoint", {}).get("name", ""),
username=payload.get("actor", {}).get("user"),
process_name=payload.get("actor", {}).get("process_name"),
process_cmdline=payload.get("actor", {}).get("command_line"),
sha256=payload.get("actor", {}).get("sha256"),
tactic=payload.get("mitre", {}).get("tactic"),
technique=payload.get("mitre", {}).get("technique"),
vendor_severity=str(payload.get("severity", "LOW")).upper(),
)
Two conventions worth following. Pin the Acme schema URL and the
date you read it in the docstring; vendors change their format and a
future maintainer needs to know which version you targeted. Default
optional fields to None (or empty string for strings); do not
substitute placeholders.
Step 2: Add a thin adapter module
In vyrox/ingestion/adapters/, create acme.py:
"""
Acme EDR webhook adapter.
The route in `ingestion/main.py` calls into `normalize`. This module
exists to keep the route file readable as the vendor count grows.
"""
from __future__ import annotations
from typing import Any
from ingestion.models import NormalizedAlert
def normalize(payload: dict[str, Any], tenant_id: str) -> NormalizedAlert:
"""Convert an Acme alert payload into a NormalizedAlert."""
return NormalizedAlert._from_acme(payload, tenant_id)
The module is intentionally tiny. The reason is convention: every
adapter ships as a normalize(payload, tenant_id) -> NormalizedAlert
function so the route code does not have to memorise factory method
names.
Step 3: Add a route in ingestion/main.py
Mirror the existing routes. Here is the shape for a bearer-token
vendor that puts tenant_id in the body:
@app.post("/webhook/acme", status_code=status.HTTP_202_ACCEPTED)
async def webhook_acme(
request: Request,
authorization: str = Header(default=""),
q: QueueClient = Depends(get_queue_client),
) -> dict[str, str]:
if not authorization or not authorization.startswith("Bearer "):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="invalid signature")
token = authorization[7:]
body = await request.body()
try:
untrusted_preview = json.loads(body)
except json.JSONDecodeError:
raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")
if not isinstance(untrusted_preview, dict):
raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")
try:
tenant_id = resolve_tenant_id("acme", untrusted_preview)
except MissingTenantIdentifier:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="missing tenant identifier")
tenant_secret = _resolve_tenant_webhook_secret(
tenant_id=tenant_id, vendor="acme", default_secret=settings.acme_webhook_secret
)
if not tenant_secret or not hmac.compare_digest(token, tenant_secret):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="invalid signature")
payload = untrusted_preview
try:
from ingestion.adapters import acme as acme_adapter
alert = acme_adapter.normalize(payload, tenant_id)
except Exception:
raise HTTPException(status_code=status.HTTP_422_UNPROCESSABLE_ENTITY, detail="bad payload")
if not q:
raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="redis unavailable", headers={"Retry-After": "5"})
try:
alert_id = await q.enqueue(alert)
return {"status": "queued", "alert_id": alert_id}
except (EnqueueFailed, ConnectionError):
raise HTTPException(status_code=status.HTTP_503_SERVICE_UNAVAILABLE, detail="redis unavailable", headers={"Retry-After": "5"})
For an HMAC-signed vendor (like CrowdStrike or the generic adapter)
swap the bearer-token check for verify(body.decode("utf-8"), x_vyrox_signature, tenant_secret). The shape stays the same.
Step 4: Wire the tenant identifier into resolve_tenant_id
Add a case to resolve_tenant_id:
elif source == "acme":
identifier = payload.get("customer_id") # or whatever Acme calls it
If the vendor identifier is missing, the function raises
MissingTenantIdentifier, the route returns 400, and the EDR retries.
No silent default.
Step 5: Add tests
Create vyrox/tests/test_adapters_acme.py. Cover at least:
- Happy path: a valid signed payload returns 202 with an
alert_id. - Missing tenant ID: returns 400.
- Wrong signature: returns 401.
- Malformed JSON: returns 422.
- Redis unavailable: returns 503 with
Retry-After: 5. - Field mapping: the resulting
NormalizedAlerthas the expected values for every field your factory populates.
Use the same fixture style as tests/test_ingestion_main.py. The
existing tests are the right template; copy and adjust.
Step 6: Update the adapter README and the public docs
Two files to touch:
vyrox/ingestion/adapters/README.md(private): add a row to the adapter table.vyrox-docs/API_REFERENCE.md(public): add the new endpoint with its full schema and the field-mapping table.
The pattern in the existing adapters is the documentation contract. A reviewer reading the new endpoint should be able to integrate against it without reading your code.
Anti-patterns we catch in review
The list below is what we have actually rejected in past reviews.
- "Just for testing" default-tenant fallback. Returns a shared bucket when the identifier is missing. This was the SEV-1 we removed on 2026-05-21. There is no scenario where this is correct.
- Re-serialising the body before HMAC verify. Python's default
json.dumpsand Rust'sserde_jsondisagree on whitespace and key order. Always verify on the raw bytes fromawait request.body(), never onjson.dumps(payload). - Skipping per-tenant secret lookup "for the pilot". The pilot is when per-tenant secrets matter most. Falling back to the global secret is a deliberate, audited choice for un-onboarded tenants only.
- Logging the full raw payload. Payloads contain process command lines, user accounts, hostnames. Log structured fields, not the whole blob.
- Treating the vendor's severity as Vyrox's verdict. The vendor's
severity goes into
vendor_severity. Triage produces a separate verdict. Conflating the two breaks the entire downstream contract.
Adapters that already exist
| Adapter | Vendor | Auth | Tenant ID source | Code |
|---|---|---|---|---|
crowdstrike | CrowdStrike Falcon detection events | HMAC-SHA256 | customer_id on body | private |
sentinelone | SentinelOne streaming API | Bearer token | accountId on body | private |
defender | Microsoft Graph Security API alertV2 | Bearer token (Microsoft clientState) | tenantId on body | private |
generic | Any EDR posting JSON | HMAC-SHA256 | URL path | private |
The CrowdStrike and SentinelOne factories live directly on
NormalizedAlert (_from_crowdstrike, _from_sentinelone) for
historical reasons. The Defender and generic factories live in the
adapter package. Newer adapters should follow the package pattern.
What the review focuses on
When a contributor opens an adapter PR, the reviewer checks:
- Authentication-before-parse order, byte-exact.
- Per-tenant secret lookup, with the global default only as a fallback for un-onboarded tenants.
- Tenant ID source is authenticated.
- Audit entry written before the 202 returns.
NormalizedAlert.sourceis unique and lowercase.raw_idis set from the vendor's own identifier.- Tests cover the five failure modes plus the happy path.
- Schema URL and date are pinned in the factory docstring.
- No raw payload logging.
- Public docs updated with the new endpoint.
Adapters that pass review tend to ship in a single PR. Adapters that fail review usually fail rule 1 (parse before verify) or rule 2 (tenant from unauthenticated source). Read the existing adapters before writing yours.
Cross-references
API_REFERENCE.mdfor the public webhook contracts.ARCHITECTURE.mdfor the six critical rules every adapter must respect.THREAT_MODEL.mdfor the attacker model.
Roadmap
This roadmap is organised by capability, not by quarter. It tells a reader what we are working on, what is committed, and what is on the list but not started. It does not contain revenue targets, customer counts, or aspirational SLA percentages. Those live in negotiated contracts and internal planning, not in OSS docs.
The roadmap reflects the state of the codebase. Items move from
"planned" to "in flight" when a branch exists. They move from "in
flight" to "shipped" when the code is in main with tests. They
move from "shipped" to "stable" when they survive a quarter without
being reverted.
Recently shipped
The two audits in May 2026 (private; the fixes are public) drove sixteen blocker items that all shipped between 2026-05-21 and 2026-05-23. The list:
P0 (audit 1, 2026-05-21):
- Per-tenant webhook secrets for CrowdStrike and SentinelOne.
- Removal of the default-tenant fallback. Missing identifier now returns HTTP 400, no silent shared bucket.
- HTTP 503 on Redis enqueue failure with
Retry-After: 5. The silent-drop bug is gone. - Distinct
QueueUnavailableexception on dequeue so the worker can apply exponential backoff instead of spinning on a dead connection. - Token budget enforced before every LLM call. Exhausted tenants
return MEDIUM/0.5 with
budget_exhaustedin the audit log. - Redis-backed retry queue for failed Discord notifications. Tenant scoped dead-letter with two-week TTL. Backup notification path triggers at five dead-letter entries per hour.
- Deterministic JSON in the proxy client. Python and Rust now sign
byte-identical payloads with
separators=(",", ":")andsort_keys=True. - Idempotency on Discord approval clicks. Alert status flips through
executingso a concurrent click loses the race. - Four new regression tests: cross-tenant isolation, HMAC round-trip, no-autonomous-containment static guard, replay-window enforcement.
P0.5 (audit 2, 2026-05-23):
- Discord Ed25519 verification on the bot
/interactionsendpoint. Anyone with the URL can no longer forge an Approve click. - HMAC verification on the bot
/webhookendpoint. The phishing pretext via fake embeds is closed. - HMAC verification on the proxy
/audit/exportendpoint. Tenant data leak via query string is closed. - SHA-256 hash chain on the Rust proxy audit log. The Python and
Rust chains now agree on format. See
AUDIT_CHAIN.md. - Python audit chain survives process restart. Previously
__init__reset to genesis; now it reads the last hash from today's log. - Real
/onboardDiscord wizard. Generates a per-tenant webhook secret withsecrets.token_hex(32), persistsTenantCredential, fires a signed synthetic alert through ingestion, refuses to mark the tenant active until the round-trip succeeds. .env.examplerewritten with logical sections and every new variable documented.Settings.effective_redis_url()resolves the canonicalREDIS_URLfirst and falls back to the legacy Upstash REST variables. The worker refuses to start with no Redis URL.
In flight
Tracked in the private todo.md; called out here when the work
touches a public contract.
- Postgres migration before tenant twenty five. SQLite write contention is the binding constraint at scale. The schema is already SQLModel-compatible so the migration is a SQL dump plus a connection string change.
- Streaming
/audit/export. Current endpoint reads the full log into memory. Fine for pilot scale; needs to be a streaming JSONL response for SaaS scale. - Per-tenant secret encryption at rest. The
tenant_credentials.webhook_secret_encryptedcolumn stores raw bytes during the pilot. Encryption module ships before the first production payment. - Concurrent per-tenant triage. The worker polls tenants
sequentially. A slow triage on tenant A blocks tenants B through
J. Move to
asyncio.gatherwith bounded concurrency. - Retry runner wired into worker startup. The retry-queue background task exists but the worker entrypoint does not invoke it. The queue accumulates without draining.
Planned, not started
Public-facing items only. The internal product roadmap covers more.
- Programmatic API. A REST API for tenants to fetch verdicts, audit entries, and statistics outside of Discord. OAuth2 client credentials per tenant. Prerequisite for MSP integrations.
- Customer-side audit verifier. A small Rust binary that walks a
tenant's
audit-YYYY-MM-DD.jsonldirectory and verifies the chain. Distributed as a single static binary. The Python reference inAUDIT_CHAIN.mdis the spec. - EU data region. Per-tenant
data_regionflag. Ingestion endpoint shard. No cross-region data flow. Required for any EU customer with a GDPR review. - Generic-vendor adapter at the route level. Today the generic
adapter's field map lives in
tenant_credentials.edr_api_key_encryptedas a JSON string. That column should be split into a dedicatedfield_map_jsoncolumn with a real schema. - Public OpenAPI spec. The four ingestion routes and the two
proxy routes documented in
API_REFERENCE.md. Auto-generated from FastAPI on the public side; hand-written for the Rust proxy. - Web operator interface. Read-only first. Tenant status, recent verdicts, audit search, monthly digests. Triggered when a customer refuses Discord-only or when total customer count crosses double digits, whichever first.
Adapter coverage
| Vendor | Status | Notes |
|---|---|---|
| CrowdStrike Falcon | shipped | Detection events via HMAC-signed webhook. |
| SentinelOne | shipped | Streaming API via bearer token. |
| Microsoft Defender for Endpoint | shipped | Graph Security API alertV2 via bearer (clientState). |
| Generic JSON webhook | shipped | Customer-mapped field map per tenant. |
| Sophos | planned | Native adapter, on demand. |
| Quick Heal / Seqrite | planned | Native adapter, on demand. |
| Trellix | planned | Native adapter, on demand. |
| Syslog (CEF / LEEF) | planned | Separate service that converts to the ingestion contract. |
The "on demand" adapters land when a real customer commits to a
pilot conditional on the vendor. We do not build speculatively. The
contract a new adapter must follow is in ADAPTERS.md.
Compliance and certification
- SOC 2 Type I evidence collection: in flight. Audit log format (hash-chained JSONL) is the substrate. Vendor selection for the audit firm is internal.
- SOC 2 Type II: planned after Type I closes.
- ISO 27001 prep: planned after SOC 2 Type II.
- Public bug bounty: not active during alpha. See
SECURITY.mdfor the disclosure-only model in effect today.
Versioning and release cadence
The four public repos follow semver. Today everything is 0.1.x. The
audit log format will get an explicit schema_version field before we
bump to 0.2.x. The HMAC signing format and the public webhook URL
shape will not change between 0.x releases without a deprecation
notice in the relevant repo's CHANGELOG.md at least thirty days
ahead.
Patch releases happen as needed. Minor releases happen when a meaningful feature lands. Major releases are reserved for breaking changes to a contract published in this repo.
What is intentionally not on the roadmap
A short list, kept honest, of capabilities we do not plan to build.
- A SIEM. Vyrox is not a log lake. We ingest alerts the EDR already decided are worth surfacing. Customers who want a SIEM have a SIEM.
- A managed SOC service with humans. We are a software platform. We point customers at MSSPs for the human SOC layer.
- A web dashboard during alpha. The first ten pilots use Discord exclusively. The dashboard ships when triggered (see "Planned, not started" above), not on a calendar.
- A free public ingestion endpoint. The ingestion service is operated
per tenant. Anyone running their own can use the open path
documented in
QUICKSTART.md.
Cross-references
README.mdfor the project overview.ARCHITECTURE.mdfor the pipeline and the six critical rules.API_REFERENCE.mdfor the contracts the roadmap references.SECURITY.mdfor the disclosure model during alpha.
Security policy
This document tells you how to report a vulnerability to Vyrox, what we
will do with the report, what is in scope and what is not, and which
properties of the system we consider security invariants. If you read
nothing else, the contact is sec.vyrox@proton.me and the PGP key is at
vyrox.dev/.well-known/pgp-key.txt.
Reporting
Send the report to sec.vyrox@proton.me. Subject line SECURITY: <one line description>. PGP-encrypt the body if the finding is sensitive;
the key is published at vyrox.dev/.well-known/pgp-key.txt.
Please include:
- A description of the issue, in plain language, with enough detail that we can reproduce it.
- The repository or service affected. If the bug is in
vyrox-proxy, include the commit hash you tested against. - A proof of concept, ideally as a single shell command or a short script. Synthetic targets only. Do not exploit a production tenant.
- Your preferred handle for credit, or "anonymous" if you would rather not be named.
Please do not file vulnerabilities as public GitHub issues.
Response
Acknowledgement within forty-eight hours. Initial triage decision within seven calendar days. Patch timeline shared within fourteen calendar days, including a target fix date and the version we expect to roll into.
If we accept the report, we coordinate disclosure with you. We default to a thirty day embargo while we ship and verify the fix. The embargo extends if the issue affects a vendor we have not yet patched against, and we tell you in writing why.
If we decline the report, we explain why in writing and you are free to disclose at your discretion. Common reasons we decline: the finding is in a third-party dependency we do not maintain, the finding requires an attacker who already controls the host, or the finding is a known trade-off documented in this repo.
Scope
In scope across every repository in the Vyrox organisation:
- Authentication bypass on any service. The Rust proxy
/executeand/audit/exportendpoints, the Python ingestion webhooks, the Discord bot/interactionsand/webhook. - Cross-tenant data leakage. Anything that lets a request from tenant A read, write, or signal tenant B.
- Audit log tampering. Anything that breaks the hash chain without
detection. See
AUDIT_CHAIN.mdfor the format. - Containment action execution without a Discord human approval click.
- HMAC verification weaknesses. Timing channels, prefix confusion,
malleability in the canonical JSON used by Python
↔Rust signing. - Replay attacks within the thirty second window on the proxy.
- LLM prompt injection that produces a result the Pydantic validator accepts but that should have been rejected.
- Webhook signature forgery against any of the four ingestion routes
(
crowdstrike,sentinelone,defender,generic). - Secret extraction from any in-memory or on-disk location the documentation says is unreachable.
Out of scope:
- Findings that require physical access to a customer host.
- Denial-of-service caused by exhausting public dependencies (rate limits on free LLM tiers, Discord rate limits, Redis quotas). These are operational concerns we degrade gracefully against, not vulnerabilities.
- Issues in EDR vendor APIs that Vyrox calls into. Report those to the vendor.
- Bugs in development-only tooling (the simulator, local docker setups) when used outside their documented purpose.
- Misconfiguration findings on a customer-operated installation that ignore the documented configuration in this repo.
Security invariants
The six critical rules in ARCHITECTURE.md
are the invariants we hold the code to. Any report that demonstrates a
break in one of these is in scope and the patch will ship with the
shortest reasonable embargo.
- Tenant isolation on every database query and every Redis key.
- Audit entry written before any state-changing response.
- HMAC verification on the raw bytes, before any parse, in constant time.
- No path from LLM output or worker logic to a containment call. Only a Discord button click reaches the proxy.
DRY_RUN=trueby default in the proxy. Production opts in.- LLM output passes Pydantic validation before any field is read.
The threat model in THREAT_MODEL.md lists the
specific attacker capabilities we defend against and the mitigations
that defend against them.
What we do with the report
After triage, you can expect:
- A private GitHub Security Advisory in the affected repository, with you tagged if you accepted credit.
- A CVE if the issue qualifies under MITRE's rules and we are the primary CVE numbering authority for the affected component.
- A code fix and a regression test that locks the fix in. We do not patch a finding without adding a test that would have caught it.
- A note in the changelog of the affected repository on the day the embargo ends.
We do not pay bounties during alpha. We do publish credit on the
vyrox.dev/security/credits page once the issue is public.
Coordinated disclosure timeline
day 0 report received
day 2 acknowledgement sent
day 7 triage decision (accept, decline, or follow-up questions)
day 14 patch timeline shared
day 30 embargo end (default)
The reporter can negotiate a longer embargo. We will not extend it unilaterally without explaining why.
What you can do today without filing a report
We welcome adversarial testing against the open-source components. The
proxy and the simulator are designed to be run against each other in a
local stack. If you find a behaviour that worries you but you are not
sure whether it is a vulnerability, open a discussion in the
vyrox-proxy repository and tag a maintainer. We will move the
conversation to private if it turns out to be sensitive.
Maintainer contact
sec.vyrox@proton.me is monitored by the founder and one engineer.
Replies come from the same address. We do not respond from personal
accounts, and we do not ask reporters to contact us through DMs on
social platforms.