hatchmoment. scored by care · not by stars

phantom

PHANTOM: Zero-copy multi-agent LLM KV-cache serving on Apple Silicon Unified Memory

PHANTOM is a high-performance caching layer for large language models on Apple Silicon. It enables multiple inference agents to share a single KV-cache slab, reducing memory usage and improving performance. By maintaining cache coherence with a MESI protocol, PHANTOM ensures data consistency across agents. The project is designed for developers working with LLMs on Apple Silicon hardware.

View on GitHub →

v-code01/phantom