Build a Conversational AI Backend with Python and Agora

Conversational AI is revolutionizing how people interact with artificial intelligence. Instead of trying to explain it all in one single text prompt, users can have natural, real-time voice conversations with AI agents. This opens exciting opportunities for more intuitive and efficient interactions.

Many developers have already invested significant time building custom LLM workflows for text-based agents. Agora’s Conversational AI Engine allows you to connect these existing workflows to an Agora channel, enabling real-time voice conversations without abandoning your current AI infrastructure.

In this guide, we’ll build a Python backend server that handles the connection between your users and Agora’s Conversational AI. By the end, you’ll have a production-ready backend that can power voice-based AI conversations for your applications.

Prerequisites

Before getting started, make sure you have:

Python (v3 or higher)
Basic knowledge of Python
An Agora account — first 10k minutes each month are free
Conversational AI service activated on your AppID

Project Setup

Let’s set up our Python server with FastAPI and Uvicorn. We’ll create a new project and install the necessary dependencies.

mkdir agora-convo-ai-server
cd agora-convo-ai-server

Next, let’s create a requirements.txt for all the project dependencies.

touch requirements.txt

In requirements.txt add the following dependencies:

fastapi
uvicorn
httpx
pydantic
agora-token-builder
python-dotenv

Next, install the dependencies by running the below-given command from the root of your project.

pip install -r requirements.txt

As we go through this guide, you’ll have to create new files in specific directories. So, before we start let’s create these new directories.

In your project root directory, create the /routes, and /class_types directories, and add the main.py file. Additionally, we will be creating a .env file for all the environment variables:

mkdir routes class_types
touch main.py
touch .env

Your project directory should now have a structure like this:

├── class_types
├── .env
├── main.py
├── requirements.txt
└── routes

FastAPI Server Setup

Let’s implement our server’s entry point for our Fastify instance including a basic health check endpoint.

For now we’ll create a basic FastAPI app, and fill it in with more functionality as we progress through the guide. I’ve included comments throughout the code to help you understand what’s happening.

At a high level, we’re setting up a new Fastify app, with a simple router structure to handle the requests. I create a ping endpoint that we can use for health checks.

Add the following code to main.py:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Initialize FastAPI app
app = FastAPI(
    title="Agora ConvoAI Python Server",
    description="Python implementation of Agora ConvoAI server",
    version="1.0.0"
)

# Configure CORS
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Health check endpoint
@app.get("/ping")
async def ping():
    return {"message": "pong"}

# Main entry point
if __name__ == "__main__":
    import uvicorn
    port = int(os.getenv("PORT", 3000))
    uvicorn.run(app, host="0.0.0.0", port=port)

Note: We are loading the PORT from the environment variables, it will default to 3000 if not set in your .env file.

Let’s test our basic FastAPI app by running:

uvicorn main:app --reload

You should see “Server is running on port 3000” in your console. You can now visit http://localhost:3000/ping to verify the server is working.

Agora Conversational AI Routes

The real power of our server comes from the Agora Conversational AI integration. Let’s get the boring stuff out of the way first, create the files for the types needed for working with Agora’s Conversational AI API:

touch class_types/agora_convo_ai_types.py

Add the following classes to class_types/agora_convo_ai_types.py :

from enum import Enum
from pydantic import BaseModel
from typing import List, Optional

class TTSVendor(str, Enum):
    MICROSOFT = "microsoft"
    ELEVENLABS = "elevenlabs"

class TTSConfig(BaseModel):
    vendor: TTSVendor
    params: dict

class AgentResponse(BaseModel):
    agent_id: str
    create_ts: int
    status: str

Now, let’s define the client request types.

Create class_types/client_request_types.py :

touch class_types/client_request_types.py

Add the following classes inside class_types/client_request_types.py

from pydantic import BaseModel
from typing import List, Optional, Union

class InviteAgentRequest(BaseModel):
    requester_id: Union[str, int]
    channel_name: str
    rtc_codec: Optional[int] = None
    input_modalities: Optional[List[str]] = None
    output_modalities: Optional[List[str]] = None

class RemoveAgentRequest(BaseModel):
    agent_id: str

These new types give some insight on all the parts we’ll be assembling in the next steps. We’ll take the client request, and use it to configure the AgoraStartRequest and send it to Agora’s Conversational AI Engine. Agora’s Convo AI engine will add the agent to the conversation.

Agent Routes

With our types defined, let’s implement the agent routes for inviting and removing agents from conversations.

Create the agent route:

touch routes/agent.py

Start with importing FastAPI, our new class_types and the agora_token_builder library, because we'll need to generate tokens for the agent. Then we'll define the /agent route.

from fastapi import APIRouter, HTTPException, Query
from typing import Optional, List, Union
from class_types.agora_convo_ai_types import TTSVendor, TTSConfig, AgentResponse
from class_types.client_request_types import InviteAgentRequest, RemoveAgentRequest
from agora_token_builder import RtcTokenBuilder
import os
import httpx
from datetime import datetime
import random
import string
import base64
import time

router = APIRouter(prefix="/agent", tags=["agent"])

Invite Agent Route

First, we’ll implement the /agent/invite endpoint. This route needs to handle several key tasks:

Parse the user request and use it to create Start request for Agora’s Convo AI Engine.
Generate a token for the AI agent to access the RTC channel.
Configure Text-to-Speech (Microsoft or ElevenLabs)
Define the AI agent’s prompt and greeting message.
Configure the Voice Activity Detection (VAD), which controls conversation flow
Sends the start request to Agora’s Conversational AI Engine.
Returns the response to the client that contains the AgentID from Agora’s Convo AI Engine response.

Add the following code to the routes/agent.py :

@router.post("/invite", response_model=AgentResponse)
async def invite_agent(request: InviteAgentRequest):
    try:
        name = generate_unique_name()
        channel_name = request.channel_name or generate_channel_name()
        expiration_time = int(datetime.now().timestamp()) + 3600
        token = await _generate_token_(uid=os.getenv("AGENT_UID"), channel= channel_name)
        
        # Get TTS configuration
        tts_vendor = TTSVendor(os.getenv("TTS_VENDOR"))
        tts_config = get_tts_config(tts_vendor)

        # Prepare request body for Agora API
        request_body = {
            "name": name,
            "properties": {
                "channel": channel_name,
                "token": token,
                "agent_rtc_uid": os.getenv("AGENT_UID"),
                "remote_rtc_uids": [str(request.requester_id)],
                "enable_string_uid": isinstance(request.requester_id, str),
                "idle_timeout": 30,
                "asr": {
                    "language": "en-US",
                    "task": "conversation"
                },
                "llm": {
                    "url": os.getenv("LLM_URL"),
                    "api_key": os.getenv("LLM_TOKEN"),
                    "system_messages": [{
                        "role": "system",
                        "content": "You are a helpful assistant..."
                    }],
                    "greeting_message": "Hello! How can I assist you today?",
                    "failure_message": "Please wait a moment.",
                    "max_history": 10,
                    "params": {
                        "model": os.getenv("LLM_MODEL"),
                        "max_tokens": 1024,
                        "temperature": 0.7,
                        "top_p": 0.95
                    },
                    "input_modalities": request.input_modalities or os.getenv("INPUT_MODALITIES", "").split(","),
                    "output_modalities": request.output_modalities or os.getenv("OUTPUT_MODALITIES", "").split(",")
                },
                "tts": tts_config.dict(),
                "vad": {
                    "silence_duration_ms": 480,
                    "speech_duration_ms": 15000,
                    "threshold": 0.5,
                    "interrupt_duration_ms": 160,
                    "prefix_padding_ms": 300
                },
                "advanced_features": {
                    "enable_aivad": False,
                    "enable_bhvs": False
                }
            }
        }

        # Make API call to Agora
        async with httpx.AsyncClient() as client:
            credential = generate_credentials()
            response = await client.post(
                f"{os.getenv('AGORA_CONVO_AI_BASE_URL')}/{os.getenv('AGORA_APP_ID')}/join",
                json=request_body,
                headers={
                    "Content-Type": "application/json",
                    "Authorization": f"Basic {credential}"
                }
            )
            response.raise_for_status()
            return response.json()  
        
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Failed to start conversation: {str(e)}"
        )

Define all the necessary functions:

def generate_unique_name(): 
    channel_name_base = 'conversation'
    timestamp = int(time.time() * 1000)  # Current time in milliseconds
    random_string = ''.join(random.choices('abcdefghijklmnopqrstuvwxyz0123456789', k=6))  # Generate a random string of length 6
    unique_name = f"{channel_name_base}-{timestamp}-{random_string}"
    return unique_name

def generate_credentials() -> str:
    customer_id = str(os.getenv("AGORA_CUSTOMER_ID"))
    customer_secret = str(os.getenv("AGORA_CUSTOMER_SECRET"))
    credentials = customer_id + ":" + customer_secret
    base64_credentials = base64.b64encode(credentials.encode("utf8"))
    credential = base64_credentials.decode("utf8")
    return credential

async def _generate_token_(
    uid: int = Query(0, description="User ID"),
    channel: str = Query(None, description="Channel name")
):
    # Validate environment variables
    if not os.getenv("AGORA_APP_ID") or not os.getenv("AGORA_APP_CERTIFICATE"):
        raise HTTPException(
            status_code=500,
            detail="Agora credentials are not set"
        )
    expiration_time = int(datetime.now().timestamp()) + 3600
    
    try:
        # Generate token using agora-token-builder
        token = RtcTokenBuilder.buildTokenWithUid(
            appId=os.getenv("AGORA_APP_ID"),
            appCertificate=os.getenv("AGORA_APP_CERTIFICATE"),
            channelName=channel,
            uid=uid,
            role=1,
            privilegeExpiredTs=expiration_time
        )
        
        return token
    
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Failed to generate Agora token: {str(e)}"
        )

def generate_channel_name() -> str:
    timestamp = int(datetime.now().timestamp())
    random_str = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
    return f"conversation-{timestamp}-{random_str}"

def get_tts_config(vendor: TTSVendor) -> TTSConfig:
    if vendor == TTSVendor.MICROSOFT:
        required_vars = [
            "MICROSOFT_TTS_KEY", "MICROSOFT_TTS_REGION",
            "MICROSOFT_TTS_VOICE_NAME", "MICROSOFT_TTS_RATE",
            "MICROSOFT_TTS_VOLUME"
        ]
        if any(not os.getenv(var) for var in required_vars):
            raise ValueError("Missing Microsoft TTS environment variables")
        
        return TTSConfig(
            vendor=vendor,
            params={
                "key": os.getenv("MICROSOFT_TTS_KEY"),
                "region": os.getenv("MICROSOFT_TTS_REGION"),
                "voice_name": os.getenv("MICROSOFT_TTS_VOICE_NAME"),
                "rate": float(os.getenv("MICROSOFT_TTS_RATE", "1.0")),
                "volume": float(os.getenv("MICROSOFT_TTS_VOLUME", "1.0"))
            }
        )

    elif vendor == TTSVendor.ELEVENLABS:
        required_vars = ["ELEVENLABS_API_KEY", "ELEVENLABS_VOICE_ID", "ELEVENLABS_MODEL_ID"]
        if any(not os.getenv(var) for var in required_vars):
            raise ValueError("Missing ElevenLabs environment variables")
        
        return TTSConfig(
            vendor=vendor,
            params={
                "key": os.getenv("ELEVENLABS_API_KEY"),
                "model_id": os.getenv("ELEVENLABS_MODEL_ID"),
                "voice_id": os.getenv("ELEVENLABS_VOICE_ID")
            }
        )

    raise ValueError(f"Unsupported TTS vendor: {vendor}")

Remove Agent

After the agent joins the conversation, we need a way to remove them from the conversation. This is where the /agent/remove route comes in, it takes the agentID and sends a request to Agora's Conversational AI Engine to remove the agent from the channel.

Add the following code to the routes/agent.py file, just below the /invite route:

@router.post("/remove")
async def remove_agent(request: RemoveAgentRequest):
    try:
        async with httpx.AsyncClient() as client:
            credential = generate_credentials()
            response = await client.post(
                f"{os.getenv('AGORA_CONVO_AI_BASE_URL')}/{os.getenv('AGORA_APP_ID')}/agents/{request.agent_id}/leave",
                headers={
                    "Content-Type": "application/json",
                    "Authorization": f"Basic {credential}"
                }
            )
            response.raise_for_status()
            return response.json()
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Failed to remove agent: {str(e)}"
        )

The agentRoutes function defines two key endpoints:

POST /agent/invite: Creates and adds an AI agent to a specified channel by:- Generating a secure token for the agent- Configuring TTS (Text-to-Speech) settings- Setting the AI’s behavior via system messages- Sending a request to Agora’s Conversational AI API
POST /agent/remove: Removes an AI agent from a conversation by:- Taking the agent_id from the request- Sending a leave request to Agora’s API

Note: The Agent routes load a number of environment variables. Make sure to set these in your .env file. At the end of this guide, I've included a list of all the environment variables you'll need to set.

Add Agent Routes to the Main Server

Let’s update our main.py file to register the agent routes:

# Previous imports remain the same
from routes import agent

# Previous code remains the same...

# Register routes
app.include_router(agent.router)

# Rest of the code remains the same...

Now we have the core Conversational AI functionality working! Let’s implement the token generation route, which will make it easier to test and integrate with frontend applications.

Token Generation

The goal with this guide is meant to build a stand-alone micro-service that works with existing Agora client apps, so for completeness we’ll implement a token generation route.

Create a new file at routes/token.py:

touch routes/token.py

Expalining this code is a bit outside the scope of this guide, but if you are new to tokens i would recommend checking out my guide Building a Token Server for Agora Applications.

One unique element of the token route that’s worth highlighting is that if a uid or channel name are not provided, this code use 0 for the uid and generates a unique channel name. The channel name and UID are returned with every token.

Add the following code to the routes/token.py file:

from fastapi import APIRouter, HTTPException, Query
from pydantic import BaseModel
from agora_token_builder import RtcTokenBuilder
import os
from datetime import datetime
import random
import string

router = APIRouter(prefix="/token", tags=["token"])

class TokenResponse(BaseModel):
    token: str
    uid: str
    channel: str

def generate_channel_name() -> str:
    timestamp = int(datetime.now().timestamp())
    random_str = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
    return f"ai-conversation-{timestamp}-{random_str}"

@router.get("/", response_model=TokenResponse)
async def generate_token(
    uid: int = Query(0, description="User ID"),
    channel: str = Query(None, description="Channel name")
):
    # Validate environment variables
    if not os.getenv("AGORA_APP_ID") or not os.getenv("AGORA_APP_CERTIFICATE"):
        raise HTTPException(
            status_code=500,
            detail="Agora credentials are not set"
        )

    # Generate channel name if not provided
    channel_name = channel or generate_channel_name()
    expiration_time = int(datetime.now().timestamp()) + 3600

    try:
        # Generate token using agora-token-builder
        token = RtcTokenBuilder.buildTokenWithUid(
            appId=os.getenv("AGORA_APP_ID"),
            appCertificate=os.getenv("AGORA_APP_CERTIFICATE"),
            channelName=channel_name,
            uid=uid,
            role=1,
            privilegeExpiredTs=expiration_time
        )
        
        return TokenResponse(
            token=token,
            uid=str(uid),
            channel=channel_name
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Failed to generate Agora token: {str(e)}"
        )

Now, update main.py file to register the token routes:

# Previous imports remain the same
from routes import token

# Previous code remains the same...

# Register routes
# Previous agent routes remain the same
app.include_router(token.router)

# Rest of the code 
remains the same...

Testing the Server

Before we can test our endpoints, make sure you have a client-side app running. You can use any applicaiton that implements Agora’s video SDK (web, mobile, or desktop). If you don’t have an app you can use Agora’s Voice Demo, just make sure to make a token request before joining the channel.

Let’s test our server to make sure everything is working correctly. First, ensure your .env file is properly configured with all the necessary credentials.

Start the server in development mode:

uvicorn main:app --reload

Note: Make sure your .env file is properly configured with all the necessary credentials. There is a complete list of environment variables at the end of this guide.

If your server is running correctly, you should see output like:

Server is running on port 3000

Testing the endpoints

Let’s test our API endpoints using curl:

Generate a token

curl http://localhost:3000/token

Expected response (your values will be different):

{
  "token": "007eJxTYBAxNdgrlvnEfm3o...",
  "uid": "0",
  "channel": "ai-conversation-1665481623456-abc123"
}

Generate a token with specific parameters

curl "http://localhost:3000/token?channel=test-channel&uid=1234"

Invite an Agent

curl -X POST http://localhost:3000/agent/invite \
  -H "Content-Type: application/json" \
  -d '{
    "requester_id": "1234",
    "channel_name": "YOUR_CHANNEL_NAME_FROM_PREVIOUS_STEP",
    "input_modalities": ["text"],
    "output_modalities": ["text", "audio"]
  }'

Expected response (your values will be different):

{
  "agent_id": "agent-abc123",
  "create_ts": 1665481725000,
  "state": "active"
}

Remove an Agent

curl -X POST "http://localhost:3000/agent/remove" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent-123"
  }'

Expected response:

{
  "success": true
}

Customizations

Agora Conversational AI Engine supports a number of customizations.

Customizing the Agent

In the /agent/invite endpoint, modify the system message to customize the agents prompt:

"system_messages": [{
                        "role": "system",
                        "content": "You are a technical support specialist named Alex. Your responses should be friendly but concise, focused on helping users solve their technical problems. Use simple language but don't oversimplify technical concepts."
                    }],

You can also update the greeting to control the initial message it speaks into the channel.

llm {
    greeting_message: 'Hello! How can I assist you today?',
    failure_message: 'Please wait a moment.',
}

Customizing Speech Synthesis

Choose the right voice for your application by exploring the voice libraries:

For Microsoft Azure TTS: Visit the Microsoft Azure TTS Voice Gallery
For ElevenLabs TTS: Explore the ElevenLabs Voice Library

Fine-tuning Voice Activity Detection

Adjust VAD settings to optimize conversation flow:

vad: {
  silence_duration_ms: 600,      // How long to wait after silence to end turn
  speech_duration_ms: 10000,     // Maximum duration for a single speech segment
  threshold: 0.6,                // Speech detection sensitivity
  interrupt_duration_ms: 200,    // How quickly interruptions are detected
  prefix_padding_ms: 400,        // Audio padding at the beginning of speech
}

Complete Environment Variables Reference

Here’s a complete list of environment variables for your .env file:

# Agora Configuration
AGORA_APP_ID=
AGORA_APP_CERTIFICATE=
AGORA_CUSTOMER_ID=
AGORA_CUSTOMER_SECRET=
AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects
AGENT_UID=

# LLM Configuration
LLM_MODEL=
LLM_URL=
LLM_TOKEN=

# Text-to-Speech Configuration
TTS_VENDOR=microsoft # Supported vendors: microsoft, elevenlabs

# Microsoft Azure TTS Configuration
MICROSOFT_TTS_KEY=
MICROSOFT_TTS_REGION=
MICROSOFT_TTS_VOICE_NAME=en-US-AndrewMultilingualNeural
MICROSOFT_TTS_RATE=1.0 # Range: 0.5 to 2.0
MICROSOFT_TTS_VOLUME=100.0 # Range: 0.0 to 100.0

# ElevenLabs TTS Configuration
ELEVENLABS_API_KEY=
ELEVENLABS_VOICE_ID=
ELEVENLABS_MODEL_ID=eleven_flash_v2_5

# Modalities Configuration
INPUT_MODALITIES=text
OUTPUT_MODALITIES=text,audio

# Server Configuration
PORT=3000

Next Steps

Congratulations! You’ve built an Express server that integrates with Agora’s Conversational AI Engine. Take this microservice and integrate it with your existing Agora backends.

For more information about Agora’s Convesational AI Engine check out the official documenation.

Happy building!

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

Flexible Classroom

Download SDKs

Support Plans and Pricing