ZINA tackles hallucinations in multimodal large language models by pinpointing erroneous spans, classifying them into six error types, and producing edited captions. It provides a simple CLI that accepts an image, a candidate caption, and a reference caption, returning JSON with span tags and a cleaned version. The tool is aimed at researchers and developers who need precise evaluation and correction of MLLM outputs. Compared to generic detectors, ZINA offers span-level insight and automatic editing, enhancing interpretability and downstream usefulness.
View on GitHub →YuigaWada/ZINA