Evaluates indirect prompt injection in LLM email agents with tool-call logging and detection mechanisms.
This project provides a harness to assess whether email assistants can be tricked into unauthorized tool use via malicious content in emails. It implements four tools (read_emails, send_email, forward_email, delete_email), logging with authorization labels, Spotlighting-style prompt marking, and MELON-Aug-style masked re-execution detection. Designed for AI safety researchers, it offers a deterministic local planner for validation and includes adversarial seed emails, enabling reproducible testing of prompt injection vulnerabilities in email agents.
View on GitHub →qarteu/tool-calling-research-project