hatchmoment. scored by care · not by stars

watchbench

Benchmark for evaluating event routing systems that decide which source events should wake downstream AI agents

WatchBench provides a synthetic email stream dataset with 500 events and 20 watch intents to evaluate different approaches to event routing - determining which events should trigger downstream agent actions. It includes multiple candidate adapters (oracle, LLM polling, OpenClaw, Watchline API) and measures precision, recall, F1, latency, source/agent calls, and token usage. Designed for developers building event-driven AI agent systems who need to compare routing strategies. The benchmark reveals concrete cost differences (e.g., 68.2% fewer source calls, 91% fewer tokens) between approaches.

agentsai-agentsbenchmarkhermes-agentopenclaw
View on GitHub →

qordinate-ai/watchbench