# 🎙️ LiveKit Voice AI with Twilio Integration

A production-ready voice AI system that enables real-time voice conversations through web browsers and phone calls. Built with LiveKit Agents, Next.js, and Twilio.

## 🌟 Features

- 🎤 **Real-time Voice Conversations** - Talk to AI through web browser
- 📞 **Phone Call Integration** - Receive and make phone calls via Twilio
- 🧠 **Multiple LLM Support** - GPT-4, Claude, Gemini
- 🔊 **Multi-Provider TTS/STT** - ElevenLabs and Cartesia
- 🌐 **Network Access** - Share with anyone on your WiFi
- 📊 **Conversation Logging** - Track all interactions
- ⚡ **Low Latency** - Optimized for real-time conversations
- 🎨 **Modern UI** - Beautiful Next.js interface

---

## 📋 Requirements

### Backend (Python)

**Python Version**: 3.9 or higher

**Dependencies** (from `pyproject.toml`):
```
livekit-agents[silero,turn-detector]~=1.2
livekit-plugins-noise-cancellation~=0.2
livekit-plugins-elevenlabs
livekit-plugins-cartesia
python-dotenv
aiohttp
```

**Package Manager**: `uv` (recommended) or `pip`

### Frontend (React/Next.js)

**Node.js Version**: 18 or higher

**Key Dependencies** (from `package.json`):
```
next: 15.5.2
react: 19.0.0
livekit-client: 2.15.15
livekit-server-sdk: 2.13.2
twilio: 5.13.0
```

**Package Manager**: `npm` or `pnpm`

---

## 🚀 Quick Start

### 1. Configure Environment Variables

#### Backend `.env.local`
Create `/BackEnd/agent-starter-python/.env.local`:
```bash
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
ELEVENLABS_API_KEY=your_elevenlabs_key
CARTESIA_API_KEY=your_cartesia_key
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_token
TWILIO_PHONE_NUMBER=+1234567890
```

#### Frontend `.env.local`
Create `/FrontEnd/agent-starter-react/agent-starter-react/.env.local`:
```bash
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
TWILIO_ACCOUNT_SID=your_twilio_sid
TWILIO_AUTH_TOKEN=your_twilio_token
TWILIO_PHONE_NUMBER=+1234567890
TWILIO_TWIML_URL=http://twimlets.com/echo?Twiml=...
```

---

### 2. Setup Telephony (One-Time)

Configure LiveKit to accept calls from Twilio:

```bash
cd BackEnd/agent-starter-python
uv run python setup_telephony.py
```

This creates:
- Inbound SIP trunk in LiveKit
- Dispatch rule to route calls to your agent

---

### 3. Install Dependencies

#### Backend
```bash
cd BackEnd/agent-starter-python

# Using uv (recommended - faster)
uv sync

# Or using pip
python3 -m pip install -e .
```

#### Frontend
```bash
cd FrontEnd/agent-starter-react/agent-starter-react

# Using npm
npm install

# Or using pnpm
pnpm install
```

---

### 4. Run Backend

```bash
cd BackEnd/agent-starter-python

# Using uv (recommended)
uv run python -m src.agent dev

# Or using python directly
python3 -m src.agent dev
```

**Expected Output:**
```
============================================================
AGENT SERVER INITIALIZED
Agent is ready to accept connections
============================================================
registered worker {"agent_name": "default", ...}
```

---

### 5. Run Frontend

```bash
cd FrontEnd/agent-starter-react/agent-starter-react

# Start on default port
npm run dev

# Or specify port 3001
PORT=3001 npm run dev
```

**Expected Output:**
```
▲ Next.js 15.5.2 (Turbopack)
- Local:        http://localhost:3001
- Network:      http://10.0.0.194:3001
✓ Ready in 2.5s
```

---

## 🎯 Usage

### Web Interface

1. Open: http://localhost:3001
2. Configure agent settings (optional):
   - Custom instructions
   - Voice selection
   - Language
   - LLM model
3. Click "Connect"
4. Start talking!

### Phone Calls

**Inbound**: Call your Twilio number - the agent answers automatically

**Outbound**: Use the web interface to make the agent call any phone number

---

## 📁 Project Structure

```
.
├── BackEnd/agent-starter-python/
│   ├── src/
│   │   ├── agent.py              # Main agent code
│   │   ├── conversation_logger.py
│   │   └── elevenlabs_server.py
│   ├── .env.local                # Backend config (create this)
│   ├── pyproject.toml            # Python dependencies
│   ├── uv.lock                   # Dependency lock file
│   ├── setup_telephony.py        # Telephony setup script
│   ├── setup_sip_trunk.py
│   └── setup_dispatch_rule.py
│
├── FrontEnd/agent-starter-react/agent-starter-react/
│   ├── app/
│   │   ├── page.tsx              # Main UI
│   │   └── api/
│   │       ├── connection-details/route.ts
│   │       └── twilio/outbound-call/route.ts
│   ├── components/
│   │   └── app/
│   │       ├── welcome-view.tsx  # Configuration UI
│   │       └── session-view.tsx  # Active call UI
│   ├── .env.local                # Frontend config (create this)
│   └── package.json              # Node dependencies
│
├── START_SERVICES.sh             # Start both services
├── KILL_ALL_SERVICES.sh          # Stop all services
├── START_HERE.md                 # Detailed setup guide
├── TELEPHONY_SETUP.md            # Telephony configuration
├── COST_ANALYSIS.md              # Cost breakdown
└── README.md                     # This file
```

---

## 🔧 Configuration

### Agent Customization

Edit system instructions in the UI or modify `DEFAULT_INSTRUCTIONS` in `agent.py`:

```python
DEFAULT_INSTRUCTIONS = """You are a helpful voice AI assistant..."""
```

### Voice Selection

**ElevenLabs Voices**:
- Sarah (Confident): `EXAVITQu4vr4xnSDxMaL`
- Roger (Casual): `CwhRBWXzGAHq8TQ4Fs17`
- Callum: `N2lVS1w4EtoT3dr4eOWO`

**Cartesia Voices**:
- Default (multilingual): `f786b574-daa5-4673-aa0c-cbe3e8534c02`
- Riya: `faf0731e-dfb9-4cfc-8119-259a79b27e12`
- Ayushi: `95d51f79-c397-46f9-b49a-23763d3eaa2d`

### LLM Models

Supported models (via LiveKit Inference):
- `openai/gpt-4.1-mini` (default)
- `openai/gpt-4.1`
- `openai/gpt-4o`
- `anthropic/claude-3.5-sonnet`
- `google/gemini-2.0-flash-exp`

---

## 🌐 Network Access

The frontend automatically binds to all network interfaces, making it accessible from any device on your local network.

**Share with others on WiFi:**
```
http://YOUR_LOCAL_IP:3001
```

Find your IP:
```bash
ip addr show | grep "inet " | grep -v "127.0.0.1"
```

---

## 📊 Monitoring

### Backend Logs

**Terminal output:**
```bash
tail -f /home/vmc/.cursor/projects/var-www-html-vikas-2025-Nov-21-Voicebot/terminals/*.txt
```

**Conversation logs:**
```bash
tail -f BackEnd/agent-starter-python/logs/*.json
```

### Check Services

```bash
# Backend
ps aux | grep "src.agent"

# Frontend
ps aux | grep "next dev"

# Ports
netstat -tuln | grep -E '300[0-3]'
```

---

## 🛠️ Management Scripts

### Start All Services
```bash
./START_SERVICES.sh
```

### Stop All Services
```bash
./KILL_ALL_SERVICES.sh
```

### Setup Telephony (One-Time)
```bash
cd BackEnd/agent-starter-python
uv run python setup_telephony.py
```

---

## 💰 Cost Analysis

**Per Session** (~3-4 minutes):
- STT: $0.009 (~₹0.75)
- LLM: $0.0125 (~₹1.04)
- TTS: $0.00008 (~₹0.007)
- **Total**: ~$0.022 (~₹1.80)

**Per Minute**: ~$0.0058 (~₹0.49)

See `COST_ANALYSIS.md` for detailed breakdown and optimization strategies.

---

## 🐛 Troubleshooting

### Backend Won't Start

**Error**: `ModuleNotFoundError: No module named 'livekit'`

**Fix**:
```bash
cd BackEnd/agent-starter-python
uv sync
```

### Frontend Won't Start

**Error**: `Cannot find module 'next'`

**Fix**:
```bash
cd FrontEnd/agent-starter-react/agent-starter-react
npm install
```

### Port Already in Use

**Fix**:
```bash
# Kill process on port 3001
fuser -k 3001/tcp

# Or use a different port
PORT=3002 npm run dev
```

### Phone Calls Not Working

**Check**:
1. Backend is running: `ps aux | grep "src.agent"`
2. Telephony configured: `uv run python setup_telephony.py`
3. Check backend logs for errors

### No Audio

**Check**:
1. `ELEVENLABS_API_KEY` or `CARTESIA_API_KEY` is set
2. API key is valid (not expired)
3. Check backend logs for TTS/STT errors

---

## 📚 Documentation

- **START_HERE.md** - Complete setup guide
- **TELEPHONY_SETUP.md** - Detailed telephony configuration
- **COST_ANALYSIS.md** - Cost breakdown and optimization
- **NETWORK_ACCESS.md** - Network configuration guide
- **QUICK_COMMANDS.md** - Command reference
- **RUNNING_STATUS.md** - Service status and monitoring

---

## 🏗️ Architecture

### Call Flow

```
[User Phone] → [Twilio] → [LiveKit SIP] → [Python Agent] → [LLM/TTS/STT]
                                                ↓
[Web Browser] → [Next.js Frontend] → [LiveKit WebRTC] → [Python Agent]
```

### Components

1. **Backend (Python)**
   - LiveKit Agents framework
   - Voice pipeline: STT → LLM → TTS
   - SIP participant handling
   - Conversation logging

2. **Frontend (Next.js)**
   - React web interface
   - LiveKit WebRTC client
   - Agent configuration UI
   - Outbound call management

3. **Telephony (Twilio + LiveKit)**
   - Twilio phone number
   - SIP trunk configuration
   - Dispatch rules
   - Call routing

---

## 🔐 Security

### Protected Files (Not in Git)

- `.env.local` - API keys and secrets
- `logs/` - Conversation logs
- `node_modules/` - Dependencies
- `.venv/` - Python virtual environment

### API Keys Required

- **LiveKit**: API key and secret
- **ElevenLabs** or **Cartesia**: For TTS/STT
- **OpenAI**: For LLM (via LiveKit Inference)
- **Twilio**: Account SID and auth token

---

## 🚀 Deployment

### Production Checklist

- [ ] Set up proper `.env.local` files
- [ ] Configure LiveKit telephony (run `setup_telephony.py`)
- [ ] Test phone calls and web interface
- [ ] Set up monitoring and logging
- [ ] Configure firewall rules for network access
- [ ] Set up SSL/HTTPS for production
- [ ] Add authentication for web interface
- [ ] Monitor API usage and costs

### Docker Deployment

Both backend and frontend include Dockerfiles for containerized deployment.

```bash
# Backend
cd BackEnd/agent-starter-python
docker build -t voicebot-backend .
docker run -p 46201:46201 --env-file .env.local voicebot-backend

# Frontend
cd FrontEnd/agent-starter-react/agent-starter-react
docker build -t voicebot-frontend .
docker run -p 3001:3000 --env-file .env.local voicebot-frontend
```

---

## 🎓 Technical Details

### Backend Stack
- **Framework**: LiveKit Agents 1.4.6
- **Language**: Python 3.9+
- **Package Manager**: uv (or pip)
- **STT**: ElevenLabs / Cartesia
- **TTS**: ElevenLabs / Cartesia
- **LLM**: OpenAI GPT-4 (via LiveKit Inference)

### Frontend Stack
- **Framework**: Next.js 15.5.2 (Turbopack)
- **UI Library**: React 19.0.0
- **WebRTC**: LiveKit Client 2.15.15
- **Styling**: Tailwind CSS 4
- **Components**: Radix UI

### Telephony Stack
- **Provider**: Twilio
- **SIP**: LiveKit SIP Trunking
- **Protocol**: WebRTC + SIP
- **Codec**: Opus (audio)

---

## 📈 Performance

### Latency Metrics
- **LLM Response**: 5-6 seconds
- **TTS Generation**: 1-2 seconds
- **End-to-End**: ~6-8 seconds from user speech to bot audio

### Optimization Tips
1. Use Cartesia for faster TTS (vs ElevenLabs)
2. Reduce system prompt size (currently 3,494 words)
3. Use Gemini 2.0 Flash for cheaper LLM
4. Enable prompt caching (if supported)

---

## 🤝 Contributing

### Development Setup

1. Clone the repository
2. Copy `.env.example` to `.env.local` in both backend and frontend
3. Install dependencies (see Requirements section)
4. Run backend and frontend (see Quick Start)

### Code Style

**Backend (Python)**:
- Formatter: Ruff
- Line length: 88
- Run: `ruff format .`

**Frontend (TypeScript)**:
- Formatter: Prettier
- Run: `npm run format`

---

## 📞 Support

For issues or questions:
- Check documentation in `/docs` folder
- Review troubleshooting section above
- Check backend logs: `BackEnd/agent-starter-python/logs/`

---

## 📄 License

See `LICENSE` files in respective directories.

---

## 🎯 Quick Commands Reference

### Start Everything
```bash
./START_SERVICES.sh
```

### Stop Everything
```bash
./KILL_ALL_SERVICES.sh
```

### Backend Only
```bash
cd BackEnd/agent-starter-python
uv run python -m src.agent dev
```

### Frontend Only
```bash
cd FrontEnd/agent-starter-react/agent-starter-react
PORT=3001 npm run dev
```

### Setup Telephony
```bash
cd BackEnd/agent-starter-python
uv run python setup_telephony.py
```

---

## 🌟 Key Features Explained

### Custom Agent Instructions
Modify agent behavior through the UI by editing system instructions. The backend automatically receives and applies these instructions.

### Multi-Language Support
Supports 12+ languages including English, Spanish, French, Hindi, Tamil, and Marathi.

### Conversation Logging
All conversations are automatically logged to JSON files with:
- Timestamps
- User/bot messages
- Latency metrics
- Session metadata

### Network Sharing
Automatically accessible on your local network. Share the URL with anyone on your WiFi to let them talk to the AI.

---

## 📊 Monitoring & Analytics

### Conversation Logs Location
```
BackEnd/agent-starter-python/logs/
```

### Log Format
```json
{
  "session_id": "voice_assistant_room_123",
  "start_time": "2026-03-17T10:08:08",
  "messages": [...],
  "config": {...},
  "total_duration": 221.95
}
```

### Metrics Tracked
- Session duration
- Message count
- LLM latency
- TTS latency
- End-to-end latency

---

## 🔗 Resources

- **LiveKit Documentation**: https://docs.livekit.io/
- **LiveKit Agents**: https://docs.livekit.io/agents/
- **Twilio SIP**: https://www.twilio.com/docs/sip-trunking
- **ElevenLabs**: https://elevenlabs.io/docs
- **Cartesia**: https://cartesia.ai/docs

---

## 🎉 Credits

Built with:
- [LiveKit](https://livekit.io/) - Real-time communication platform
- [Next.js](https://nextjs.org/) - React framework
- [Twilio](https://www.twilio.com/) - Telephony provider
- [ElevenLabs](https://elevenlabs.io/) - Text-to-speech
- [Cartesia](https://cartesia.ai/) - Text-to-speech & speech-to-text
- [OpenAI](https://openai.com/) - Language models

---

**Ready to start?** Run `./START_SERVICES.sh` and open http://localhost:3001!
