Laplace provides priority‑aware admission and automatic unloading of idle or externally‑loaded models on a single local LLM inference server (e.g., LM Studio, Ollama). It watches the server’s model status, queues requests by tier, and periodically sweeps unused models, even those loaded by other processes. This helps developers running multiple agents or jobs avoid OOM and latency issues on memory‑constrained machines. It’s aimed at AI engineers and DevOps folks who self‑host LLMs and need smarter resource sharing.
View on GitHub →mikatachan/laplace