Skip to main content

STTService (Speech-To-Text)

Speech-to-text is primarily provided by Deepgram through the DailyTransport. We recommend setting a few transcription properties in your code:

transport.transcription_settings["extra"]["endpointing"] = True
transport.transcription_settings["extra"]["punctuate"] = True

These settings enable Deepgram to return transcriptions very quickly, but still include punctuation. This makes it easier to do sentence aggregation or display captions.

Deepgram provides transcription to the Daily call server, which forwards the transcriptions to the clients connected to the call. Pipecat makes those transacriptions available as TranscriptionFrames in your app. Deepgram determines when it has enough audio data to transcribe a user's speech, so it will return full sentences or phrases. There are some helpful utility services that can ensure you have a complete response from a user before processing their speech.

Local transcription with Whisper

The framework also includes a service for running Whisper transcription locally. Take a look at the whisper example in the framework for more information.