One validated high severity path should be remediated before broad agent rollout.
A concise security report your team can act on.
This static sample shows the structure of a WardenBot.ai delivery: bounded scope, validated findings, redacted evidence, severity rationale, and agent-ready remediation Markdown.
Executive summary
The highest risk issue is an agent control failure: retrieved content can influence a privileged tool request before the application verifies instruction provenance. The recommended fix is a server-side tool policy layer with regression tests for indirect prompt injection.
Findings include redacted payload source, observed behavior, and control gap notes.
Fix guidance is written as Markdown suitable for an engineer or coding agent.
What was tested
- Public web app and API routes under app.example-ai.test
- Authenticated agent workflow using test tenant WB-SAMPLE
- RAG document ingestion and retrieval boundary checks
- No destructive testing, persistence, phishing, or production data access
Findings table
Retrieved attacker-controlled content influenced a privileged CRM lookup before server-side policy checks.
Debug events include tenant identifiers, document titles, and partial user prompts visible to support roles.
Missing clickjacking and content-type protections increase browser-side risk for authenticated users.
Why WB-SAMPLE-01 is High
- Exploitability was confirmed in an authorized test tenant using non-destructive payloads.
- High severity is assigned when untrusted retrieved content can alter tool intent and reach privileged data paths.
- Business impact is bounded by current role checks, but the vulnerable control sits in a shared agent execution path.
Proof without exposing client data
Payload source
Uploaded knowledge-base page [REDACTED-DOC-ID] containing an instruction override marker.
Observed behavior
Agent selected crm.lookup_customer with arguments derived from retrieved content, not the user request.
Control gap
Tool authorization evaluated user role, but did not verify instruction provenance or trusted context.
Sensitive data
Customer names, tenant IDs, access tokens, and hostnames are redacted in this sample artifact.
POST /api/agent/run
tenant: [REDACTED-TENANT]
retrieved_doc: [REDACTED-DOC-ID]
tool_selected: crm.lookup_customer
policy_result: allowed_before_fix, rejected_after_fix Markdown handoff
# agent-fix.md
## Finding
WB-SAMPLE-01: Indirect prompt injection can request privileged tool execution.
## Goal
Only trusted application instructions may authorize privileged tools. Retrieved
content can inform an answer, but cannot create or modify tool intent.
## Implementation tasks
- Add a server-side allowlist for each tool by route, role, tenant, and workflow.
- Tag retrieved chunks as untrusted_context before they enter the agent prompt.
- Reject tool calls when arguments are sourced only from untrusted_context.
- Require explicit user intent for crm.lookup_customer and billing.export tools.
- Add audit logging for allowed, rejected, and policy-missing tool decisions.
## Tests
- Fixture: uploaded document instructs the agent to call crm.lookup_customer.
- Expectation: tool call is rejected and the user receives a safe refusal.
- Fixture: authorized user asks for an allowed lookup with a known customer ID.
- Expectation: tool call succeeds and policy decision is logged. What must pass before closure
- Untrusted retrieved text cannot trigger, rename, or parameterize privileged tool calls.
- Server-side tool policy rejects calls without trusted instruction provenance and explicit route authorization.
- Regression tests cover direct prompts, retrieved-document payloads, and mixed trusted/untrusted context.
- Telemetry confirms rejected tool calls are logged without leaking secrets, tokens, or tenant metadata.
Start with a bounded, authorized review.
Free Surface Recon maps public exposure first. Paid testing begins only after scope review, authorization, safety limits, and pricing are confirmed.