hatchmoment. scored by care · not by stars

KayzeeOCR

KayzeeOCR: layout detection and structured OCR using Qwen2.5-VL-2B

KayzeeOCR tackles the problem of extracting structured information from scanned documents by first detecting layout elements and then recognizing their content with a vision‑language model. It supports a wide range of input formats (PDF, DOCX, images) and emits a detailed JSON schema with bounding boxes and text. The two‑stage pipeline runs on a ≤2B‑parameter model, making it feasible on mid‑range GPUs. Researchers and developers needing reliable document parsing can use it out‑of‑the‑box without building their own OCR stack.

View on GitHub →

ahmzakif-dev/KayzeeOCR