DailyTransport
The DailyTransport
is a full-featured transport that enables you to join Pipecat bots to Daily WebRTC video calls. It's built on top of daily-python, and it gives you access to a lot of advanced features—but it's also a great way to simply use WebRTC as the media backbone for building real-time bot interactions.
DailyTransport.new()
DailyTransport.new(
room_url: str,
token: str | None,
bot_name: str,
min_others_count: int = 1,
start_transcription: bool = False,
**kwargs
)
Positional arguments (but it's probably better to call them by name):
room_url
: The Daily room URL to connect to. It looks likehttps://YOURDOMAIN.daily.co/YOURROOM
.token
: You'll usually want to connect the bot to the room with owner privileges. If so, you can Daily's REST API to create a meeting token, and include that token string here.bot_name
: You'll see the bot's name if you use Daily Prebuilt, for example.min_others_count
: After someone else joins, when the number of other participants in the room drops back below this number, the bot will exit. set this to 0 disable the bot automatically leaving the room.start_transcription
: If you want to receiveTranscriptionFrame
s, you'll need to set this to True. This requires an owner token to be included in thetoken
property.
Other available keyword arguments:
vad_enabled
: Whether or not to use Voice Activity Detection (VAD). IfTrue
, the transport will emitUserStartedSpeaking
andUserStoppedSpeaking
frames. VAD is also necessary for interruptions support. Defaults toFalse
.vad_start_s
: The amount of time a user needs to speak before the transport emits aUserStartedSpeaking
frame. Defaults to0.2
, or 200ms.vad_stop_s
: The amount of time a user needs to stop speaking and remain silent before the transport emits aUserStoppedSpeaking
frame. Defaults to0.8
, or 800ms. This value represents a good middle ground: It's short enough that conversation feels responsive, but long enough that Deepgram can usually return all the transcriptions before theUserStoppedSpeaking
frame is emitted.
A bit more about VAD
The DailyTransport
can use the VAD support built into the WebRTC library with no additional dependencies. However, we recommend installing the optional silero
dependency if your platform supports it. This is an AI VAD library powered by Torch, and it's generally a bit better at distinguising talking from background noise. Pipecat will automatically use Silero VAD if you've installed the dependencies.
Event Handlers
add_event_handler
participant_joined
first_other_participant_joined
participant_left
transcription_message
app_message
Other transcription events
Send and receive behavior
transport.run()
transport.run(pipeline: Pipeline | None = None, override_pipeline_source_queue=True)
This method runs the transport. For the DailyTransport
, that includes joining the Daily room and setting up audio/video send and receive, depending on what you've configured.
This method also accepts a pipeline
argument. If you include a pipeline, the transport will run and manage that Pipeline for you. That includes connecting the pipeline's source and sink to the transport's send and receive queues. It will also start and stop your pipeline when the transport starts and stops.
Your app will almost always include some form of await transport.run()
, usually await transport.run(pipeline)
.
transport.run_pipeline()
async def run_pipeline(pipeline: Pipeline, override_pipeline_source_queue=True):
This method connects the pipeline's source and sink to the transport's send and receive queues, but it doesn't manage the pipeline's lifecycle. You'll need to await transport.run_pipeline(pipeline)
separately from await transport.run()
.
The override_pipeline_source_queue
property is used for a few things internally.
transport.run_interruptible_pipeline()
transport.run_interruptible_pipeline(
pipeline: Pipeline,
pre_processor: FrameProcessor | None = None,
post_processor: FrameProcessor | None = None,
)
This method runs the pipeline connected to the transport's queues, but it runs it inside a cancelable asyncio.task()
. If the transport detects that the user starts speaking (which generates a UserStartedSpeaking
frame), the transport will cancel the currently executing asyncio.task()
, empty all the frames in the transport's send queue, and start a new task.
The end result of this is that you can run the pipeline, and anytime the user speaks, the bot will stop what it's doing and start listening to the user.
Typically, you'll want to create your pipeline such that it expects to receive and accumulate TranscriptionFrame
s from the user, and start generating a response as soon as it receives a UserStoppedSpeakingFrame
.
This method also accepts two optional services as pre_processor
and post_processor
. As it turns out, pre_processor
doesn't do anything special, so you can probably ignore it.
But post_processor
is a bit different. As the transport runs, it consumes the frames coming out of the pipeline: Displaying ImageFrame
s as video, playing AudioFrame
s as audio, etc. But when running an interruptible pipeline, the transport will send each frame through the post_processor
after it finishes doing whatever it's supposed to with that frame. More specifically, each AudioFrame
goes to the post_processor
after it has been successfully played.
If the pipeline gets interrupted, the contents of the transport's output queue get dumped, so none of those frames go through the post_processor
.
By convention, immediately after sending AudioFrame
s with generated speech, text-to-speech services send a TextFrame
with the text of that speech through the pipeline. So if you put an LLMContextAggregator
in the post_processor
of an interruptible pipeline, you can ensure that the bot's context will only contain sentences it actually said to the user. If a bot generates an 8-sentence response, but the user interrupts the bot in the middle of the 4th sentence, the context will only contain the first three sentences.
This method does not manage the pipeline lifecycle. You'll still need to do something like:
asyncio.gather(transport.run(), transport.run_interruptible_pipeline(pipeline))
Instance Methods
(In order to keep the documentation a bit more readable, some of the functionality described here actually comes from a BaseTransport
class.)
send_app_message
Calls daily-python's send_app_message
function.
dialout
Calls start_dialout
in daily-python, which can be used to call SIP or PSTN phone numbers. See the Daily docs for more information.
start_recording
Starts a recording of the Daily room. See the Daily docs for more information.
say
This is a convenience method for generating text-to-speech from a given sentence. It bypasses any running pipelines and just sends the sound directly to the transport.
stop
and stop_when_done
Functions for stopping a running transport. You probably don't need to call these directly; instead, sending an EndFrame
though your pipeline should stop everything.
Frame Behaviors
Here's a list of different kinds of frames, and how the DailyTransport
handles them:
AudioFrame
: The transport will break the audio data into ~0.5s chunks and play them usingdaily-python
. The audio playback is synchronous in the transport's thread, which means that if the transport's queue contains severalAudioFrame
s followed by anImageFrame
, theImageFrame
won't get handled until playback of theAudioFrame
s is completed.ImageFrame
: When the transport receives anImageFrame
, it will display that image in the bot's webcam video inside the Daily call. That image will stay set and appear on screen until anotherImageFrame
is received.SpriteFrame
: These frames contain a sequence of images. When the transport receives aSpriteFrame
, it will loop those frames in the bot's webcam video at the configured frame rate of the transport until it receives anotherSpriteFrame
orImageFrame
.UserImageRequestFrame
: If the transport'svideo_rendering_enabled
property is set toTrue
, when it receives aUserImageRequestFrame
, it will grab a frame from one or all participants' cameras and put those frames into the pipeline asUserImageFrame
sSendAppMessageFrame
: If aDailyTransport
receives this frame, it will usesend_app_message()
from daily-python to send a message to other call participants.