Swahili TTS pipeline for data prep, baseline evaluation, and fine‑tuning
The project tackles the scarcity of high‑quality Swahili text‑to‑speech by providing scripts to convert the WaxalNLP dataset into LJSpeech‑style wavs, filter noisy transcripts, run baseline syntheses, and evaluate results with dual ASR judges and UTMOS. It also includes a Modal‑based orchestration layer for running the whole pipeline (including optional LoRA fine‑tuning) without a local GPU. Designed for researchers and developers who need a reproducible, ready‑to‑run Swahili TTS stack, it stands out by bundling data handling, baseline comparison, and cloud‑ready training in one repo.
View on GitHub →Msingi-AI/sauti-tts-v2