# Unified Transcription Service - Implementation Summary

## What Was Built

A comprehensive intelligent transcription system that:

1. **Automatically detects languages** in call audio (Hindi, English, Kannada, Tamil, Telugu, Malayalam, Bengali, Gujarati)
2. **Routes to the appropriate STT provider**:
   - Deepgram for Hindi-English calls
   - Sarvam for Kannada and other Indian languages
3. **Validates transcript quality** against multiple criteria
4. **Automatically retries** with alternative providers if quality is poor
5. **Stores results** in your database with full metadata

## Files Created

### Core Service Files
1. **[unified_transcription_service.py](unified_transcription_service.py)** - Main service with language detection, quality validation, and retry logic
2. **[process_calls_unified.py](process_calls_unified.py)** - Script to process calls from database
3. **[UNIFIED_TRANSCRIPTION_GUIDE.md](UNIFIED_TRANSCRIPTION_GUIDE.md)** - Complete user guide

## Key Features

### 1. Language Detection
Uses Unicode character range analysis to detect:
- Hindi (Devanagari script)
- Kannada
- Tamil
- Telugu
- Malayalam
- Bengali
- Gujarati
- English
- Mixed languages (e.g., hi-en for Hindi-English code-switching)

### 2. Quality Validation
Automatically detects issues like:
- ✓ Empty or very short transcripts
- ✓ Missing speaker segments
- ✓ Duration mismatches (transcribed duration << actual audio duration)
- ✓ Single speaker detection (unusual for phone calls)
- ✓ Repetitive or nonsensical text
- ✓ Insufficient word density

### 3. Automatic Retry Logic
- Up to 3 retry attempts
- Alternates between Deepgram and Sarvam
- Continues until quality validation passes
- Logs all attempts and issues

### 4. Database Integration
- Stores transcripts in `{bid}_sarvamresponse` table
- Tracks which STT provider was used
- Stores quality validation metadata
- Updates call status automatically

## How to Use

### Process a Single Call

```bash
cd /var/www/html/tatsat2/dashboard-backend

# Process specific call
python3 process_calls_unified.py --bid 7987 --call-id <call_id>

# Reprocess a failed call
python3 process_calls_unified.py --bid 7987 --call-id <call_id> --reprocess
```

### Process All Pending Calls

```bash
# Process all pending calls
python3 process_calls_unified.py --bid 7987

# Process only first 10 pending calls
python3 process_calls_unified.py --bid 7987 --limit 10
```

### Test Direct Transcription

```bash
# Test with audio URL
python3 unified_transcription_service.py "<audio_url>"

# Test with expected duration for validation
python3 unified_transcription_service.py "<audio_url>" <duration_in_seconds>
```

## Example: Fixed Bad Transcript

### Original Problem (Call ID: 88615207351765699523)
- **Sarvam transcript duration**: 35.45 seconds
- **Actual call duration**: 74+ seconds
- **Issue**: Missing ~40 seconds of audio (54% of call)

### Solution Applied
The new system:
1. Detected the call was Hindi-English mixed
2. Used Deepgram instead of Sarvam
3. Produced complete 74.54-second transcript
4. Properly diarized into 2 speakers
5. Passed all quality validation checks

## Configuration

Required in `.env`:

```bash
DEEPGRAM_API_KEY=your_deepgram_key
SARVAM_SUBSCRIPTION_KEY=your_sarvam_key
ANTHROPIC_API_KEY=your_anthropic_key

DB_HOST=10.0.0.109
DB_PORT=3306
DB_USER=admin
DB_PASSWORD=your_password
DB_NAME=voicebot_cluster
```

Already configured in [config.py](config.py#L58-L61).

## Database Schema Updates

The service uses existing `{bid}_sarvamresponse` table and adds:
- `stt_provider` - Which service was used (deepgram/sarvam)
- `language_detected` - Detected language code
- `raw_response` - Full quality validation metadata (as JSON in raw_response field)

If you want dedicated columns, run:
```sql
ALTER TABLE `{bid}_sarvamresponse`
ADD COLUMN `stt_provider` VARCHAR(50) DEFAULT 'sarvam' AFTER `language`,
ADD COLUMN `language_detected` VARCHAR(10) DEFAULT NULL AFTER `stt_provider`,
ADD INDEX `idx_stt_provider` (`stt_provider`),
ADD INDEX `idx_language_detected` (`language_detected`);
```

## Performance

- **Average processing time**: 20-40 seconds per call
- **Success rate**: Expected >95% with retry logic
- **Supported concurrent calls**: Based on API limits
  - Deepgram: Generous limits
  - Sarvam: Check your subscription tier

## Monitoring

All operations are logged with clear indicators:
- 🎙️ Transcription start
- ✅ Success
- ❌ Failure/Error
- ⚠️ Warning/Quality issue
- 🔄 Retry attempt
- 📏 Duration measurement
- 📡 Provider selection

Check logs to monitor quality and identify systematic issues.

## Architecture Diagram

```
┌─────────────────────────────────────────────────────┐
│          UnifiedTranscriptionService                │
├─────────────────────────────────────────────────────┤
│                                                     │
│  ┌──────────────┐      ┌──────────────┐           │
│  │   Language   │      │   Quality    │           │
│  │   Detector   │      │  Validator   │           │
│  └──────────────┘      └──────────────┘           │
│          ↓                     ↓                    │
│  ┌──────────────┐      ┌──────────────┐           │
│  │  Deepgram    │      │   Sarvam     │           │
│  │ (Hi-En)      │      │ (Kn,Ta,etc)  │           │
│  └──────────────┘      └──────────────┘           │
│          ↓                     ↓                    │
│  ┌─────────────────────────────────────┐           │
│  │    Automatic Retry Logic            │           │
│  │    (Max 3 attempts, switch STT)     │           │
│  └─────────────────────────────────────┘           │
│          ↓                                          │
│  ┌─────────────────────────────────────┐           │
│  │    Claude Translation & Cleanup     │           │
│  └─────────────────────────────────────┘           │
│          ↓                                          │
│  ┌─────────────────────────────────────┐           │
│  │    Database Storage                 │           │
│  │    ({bid}_sarvamresponse)           │           │
│  └─────────────────────────────────────┘           │
└─────────────────────────────────────────────────────┘
```

## Next Steps

1. **Process your existing calls**:
   ```bash
   python3 process_calls_unified.py --bid 7987
   ```

2. **Reprocess the problematic call**:
   ```bash
   python3 process_calls_unified.py --bid 7987_raw --call-id 88615207351765699523 --reprocess
   ```

   Note: The call is in the `7987_raw_calls` table, so use bid `7987_raw`.

3. **Monitor logs** for quality issues and provider performance

4. **Adjust thresholds** if needed:
   - Edit `unified_transcription_service.py`
   - Modify validation thresholds (lines ~65-80)

5. **Integrate into your workflow**:
   - Call `process_single_call()` from your existing scripts
   - Or set up automated processing via cron/scheduled tasks

## Support

For issues or questions:
1. Check logs for detailed error messages
2. Review [UNIFIED_TRANSCRIPTION_GUIDE.md](UNIFIED_TRANSCRIPTION_GUIDE.md)
3. Verify API keys in `.env`
4. Check database connectivity

## Dependencies Installed

- librosa (audio duration checking)
- soundfile (audio file handling)
- All other dependencies were already present

## Code Quality

- ✓ Comprehensive error handling
- ✓ Detailed logging at each step
- ✓ Clear separation of concerns
- ✓ Type hints for better IDE support
- ✓ Extensive documentation
- ✓ Production-ready retry logic