CLOVA tackles visual question answering, multi‑image reasoning, image editing, and knowledge tagging by linking large language models with specialized vision tools. It operates in inference, reflection, and learning phases, allowing tools to be updated from human feedback. Researchers and developers can run demos and extend the framework with new tools via provided APIs. Compared to static pipelines, CLOVA’s closed‑loop design enables continual improvement of tool performance.
View on GitHub →chuanhuafan9-boop/CLOVA_project