Mosaic extracts text (including OCR), token‑aware chunks, embeds with a sentence‑transformers model, and upserts into Milvus (dense + BM25). After insertion it checks that the stored count matches the chunk count, failing loudly if any chunk is missing. This makes it ideal for developers building reliable retrieval‑augmented generation pipelines who need local, API‑free indexing. Its coverage verification and hybrid retrieval set it apart from typical RAG indexers.
View on GitHub →fazalrshah/mosaic