KayzeeOCR tackles the problem of extracting structured information from scanned documents by first detecting layout elements and then recognizing their content with a vision‑language model. It supports a wide range of input formats (PDF, DOCX, images) and emits a detailed JSON schema with bounding boxes and text. The two‑stage pipeline runs on a ≤2B‑parameter model, making it feasible on mid‑range GPUs. Researchers and developers needing reliable document parsing can use it out‑of‑the‑box without building their own OCR stack.
View on GitHub →ahmzakif-dev/KayzeeOCR