# Deepgram STT Integration Setup Guide

## Overview

This implementation provides a multi-STT pipeline with translation and cleanup:

```
Call Recording → Deepgram STT → Language Detection → Translation (if needed) →
Grammar/Cleanup → Database Storage
```

## Features

✅ **Multi-STT Support** - Deepgram primary, Sarvam fallback
✅ **Auto Language Detection** - Detects Hindi/English/Mixed
✅ **Translation Layer** - Translates Devanagari to English using Claude Haiku
✅ **Grammar Cleanup** - Fixes typos, punctuation, formatting
✅ **Speaker Diarization** - Proper speaker segments with timestamps
✅ **Provider Tracking** - Stores which STT service was used
✅ **Confidence Scores** - Tracks transcription quality

## Setup Instructions

### 1. Install Required Dependencies

```bash
cd /var/www/html/tatsat2/dashboard-backend
pip install anthropic requests
```

### 2. Configure API Keys

Add your API keys to `.env` file:

```bash
# Deepgram STT Configuration
DEEPGRAM_API_KEY=your-deepgram-api-key-here

# Anthropic API for Translation/Cleanup
ANTHROPIC_API_KEY=your-anthropic-api-key-here
```

**To get API keys:**

- **Deepgram**: https://console.deepgram.com/signup
  - Sign up for free tier (45,000 minutes free)
  - Get API key from console

- **Anthropic**: https://console.anthropic.com/
  - Sign up for Claude API
  - Get API key from settings

### 3. Database Schema Already Updated

The database has been migrated to include:
- `stt_provider` - Which STT service was used
- `stt_confidence` - Confidence score
- `language_detected` - Detected language (en/hi/mixed)
- `raw_transcript` - Original transcript before translation
- `translation_applied` - Whether translation was performed

### 4. Test the Pipeline

Test with a single call:

```bash
# Get a sample audio URL from database
mysql -h 10.0.0.109 -u admin -pmcube@admin123 voicebot_cluster -e \
  "SELECT fileurl FROM 7987_raw_calls WHERE call_status='ANSWER' AND fileurl IS NOT NULL LIMIT 1;"

# Test Deepgram transcription
python3 deepgram_transcription.py "https://your-audio-url-here.wav"
```

Expected output:
```
🎙️  Starting Deepgram transcription...
✅ Deepgram transcription successful
   Duration: 45.23s
   Confidence: 0.95
   Language: hi
   Speakers: 2
🧠 Starting translation and cleanup...
   Detected language: hi
🔧 Cleaning speaker segments...
✅ Translation and cleanup complete
```

### 5. Process Calls with New Pipeline

Process 1 call:
```bash
python3 comprehensive_transcribe_v2.py --bid 7987 --limit 1
```

Run continuously (processes 1 call every 5 minutes):
```bash
python3 comprehensive_transcribe_v2.py --bid 7987 --limit 1 --continuous --interval 300
```

Run in background:
```bash
nohup python3 comprehensive_transcribe_v2.py --bid 7987 --limit 1 --continuous --interval 300 > transcribe_v2.log 2>&1 &
```

### 6. Monitor Progress

Check logs:
```bash
tail -f comprehensive_transcription_v2.log
```

Check database:
```bash
mysql -h 10.0.0.109 -u admin -pmcube@admin123 voicebot_cluster -e "
SELECT
  callid,
  stt_provider,
  stt_confidence,
  language_detected,
  translation_applied,
  num_speakers,
  LENGTH(transcript) as transcript_length,
  created_at
FROM 7987_sarvamresponse
ORDER BY created_at DESC
LIMIT 5;
"
```

## Architecture Details

### 1. DeepgramTranscriber Class

Located in: `deepgram_transcription.py`

**Methods:**
- `transcribe_audio()` - Calls Deepgram API with diarization
- `detect_language()` - Checks for Devanagari script
- `translate_and_cleanup()` - Uses Claude Haiku for translation
- `process_call()` - Complete pipeline orchestration

**Deepgram Parameters:**
```python
{
    'model': 'nova-2',           # Latest model
    'language': 'multi',         # Auto-detect
    'diarize': 'true',          # Speaker separation
    'punctuate': 'true',        # Add punctuation
    'paragraphs': 'true',       # Structure text
    'utterances': 'true',       # Individual segments
    'smart_format': 'true'      # Clean formatting
}
```

### 2. Translation Pipeline

Uses **Claude 3.5 Haiku** (fastest, most cost-effective):

**Prompt Structure:**
1. Language detection
2. Translation if Devanagari detected
3. Grammar and punctuation fixes
4. Typo correction
5. Natural language flow preservation

**Cost per call:**
- Average call (3 mins, 400 words): ~$0.001
- With translation: ~$0.002
- 100 calls/day: ~$0.20/day

### 3. Fallback Mechanism

```python
def transcribe_with_fallback(audio_url):
    # Try Deepgram (primary)
    result = deepgram_transcribe(audio_url)
    if result:
        return result

    # Fallback to Sarvam (TODO)
    result = sarvam_transcribe(audio_url)
    return result
```

## Cost Comparison

| Service | Quality | Cost (per hour) | Hindi Support | Diarization |
|---------|---------|-----------------|---------------|-------------|
| **Deepgram** | ⭐⭐⭐⭐⭐ | $0.0125 | ✅ Excellent | ✅ Yes |
| Sarvam | ⭐⭐⭐ | $0.015 | ✅ Good | ✅ Yes |
| AssemblyAI | ⭐⭐⭐⭐ | $0.015 | ⭐ Limited | ✅ Yes |
| Whisper | ⭐⭐⭐⭐ | Free (self-host) | ✅ Good | ❌ No |

**Recommendation:** Use Deepgram as primary for best quality/cost ratio.

## Quality Improvements

### Before (Sarvam only):
```
Speaker 0: हेलो हेलो हां बताइए हेलो नमस्कार सर आपकी वेलकम माम मैं
Speaker 1: डोमिनोस़ से बोल रही हूँ
```

### After (Deepgram + Translation):
```
Agent: Hello, hello, yes please tell me. Hello, Namaste sir, welcome ma'am.
Customer: I'm calling from Dominos.
```

## Troubleshooting

### Issue: "DEEPGRAM_API_KEY not found"
**Solution:** Add API key to `.env` file

### Issue: "Translation fails"
**Solution:** Check ANTHROPIC_API_KEY is valid

### Issue: "No calls to process"
**Solution:** Check transcription_status:
```sql
SELECT COUNT(*) FROM 7987_raw_calls
WHERE transcription_status = 0 AND call_status = 'ANSWER';
```

### Issue: "Deepgram API error 401"
**Solution:** Verify API key is correct and has credits

## Next Steps

1. ✅ **Immediate:** Add Deepgram & Anthropic API keys to `.env`
2. ✅ **Test:** Process 5 sample calls to verify quality
3. ⏳ **Deploy:** Run continuous processing for all pending calls
4. ⏳ **Monitor:** Check quality improvements in dashboard
5. ⏳ **Scale:** Add more STT fallbacks (AssemblyAI, Whisper)

## Support

- **Deepgram Docs:** https://developers.deepgram.com/docs
- **Claude API Docs:** https://docs.anthropic.com/claude/reference
- **Issue Tracking:** Check `comprehensive_transcription_v2.log`

## File Structure

```
dashboard-backend/
├── deepgram_transcription.py          # Core STT + Translation module
├── comprehensive_transcribe_v2.py     # Main processing script
├── migrations/
│   └── add_stt_provider_columns.sql  # Database schema update
├── .env                              # API keys (add yours here)
├── DEEPGRAM_SETUP.md                 # This file
└── comprehensive_transcription_v2.log # Processing logs
```