Tool for measuring annotation agreement against a Bengali sentiment gold set
The repository offers a simple script to compute Cohen's Kappa and accuracy between blind annotations and a gold-standard Bengali sentiment dataset. It includes sample data, a rubric, and scripts to generate agreement reports, helping annotation teams calibrate workers. Useful for NLP teams and annotation vendors needing transparent quality metrics. While similar scripts exist, it packages the workflow for Bengali sentiment specifically.
View on GitHub →Rituparno-Majumdar/bengali-annotation-quality