hatchmoment. scored by care · not by stars

stt-pipeline

Offline speech-to-text and speaker diarization pipeline

notablePython🧠 AI & ML

The tool takes audio files in many formats, converts them to mono‑WAV 16 kHz, then uses Vosk to produce word‑level transcriptions with timestamps and SpeechBrain to diarize speakers. It outputs structured JSON and ready‑to‑use SRT subtitle files, supporting batch processing or an HTTP server interface. It’s aimed at developers and researchers who need accurate, offline transcription and speaker labeling without relying on cloud APIs, offering an all‑in‑one solution that integrates conversion, transcription, diarization, and formatting.

diarizationpythonspeechbrainsttvosk
View on GitHub →

magomedcoder/stt-pipeline