PHANTOM: Zero-copy multi-agent LLM KV-cache serving on Apple Silicon Unified Memory
PHANTOM is a high-performance caching layer for large language models on Apple Silicon. It enables multiple inference agents to share a single KV-cache slab, reducing memory usage and improving performance. By maintaining cache coherence with a MESI protocol, PHANTOM ensures data consistency across agents. The project is designed for developers working with LLMs on Apple Silicon hardware.
View on GitHub →v-code01/phantom