# Job Progress Tracking System

## Overview

The AI Feedback Service now includes comprehensive job progress tracking to monitor processing steps and detect stuck jobs. This system provides real-time visibility into job status and helps identify and resolve processing issues.

## Features

### 1. **Detailed Progress Tracking**
- **Step-by-step progress**: Each job is tracked through specific processing steps
- **Progress percentage**: Visual indication of completion (0-100%)
- **Heartbeat monitoring**: Regular updates to detect if jobs are still alive
- **Timing information**: Track when each step started and how long it takes

### 2. **Processing Steps**

The system tracks jobs through these processing steps:

| Step | Description | Progress % |
|------|-------------|------------|
| `initializing` | Job claimed and starting processing | 5% |
| `downloading_file` | Downloading file from S3 | 15% |
| `extracting_audio` | Extracting audio from video | 25% |
| `transcribing_audio` | Converting speech to text using Whisper | 45% |
| `analyzing_content` | Analyzing content for filler words, etc. | 60% |
| `analyzing_visual` | Processing video visual characteristics | 75% |
| `gaze_tracking` | Analyzing eye contact and gaze patterns | 85% |
| `face_analysis` | Detecting smiles and facial expressions | 90% |
| `saving_results` | Storing results in database | 95% |
| `completing` | Finalizing job and sending notifications | 98% |
| `completed` | Job successfully completed | 100% |
| `failed` | Job failed with error | 0% |

### 3. **Stuck Job Detection**

The system automatically detects jobs that may be stuck:

- **Step timeout**: Jobs stuck in a single step for too long (default: 30 minutes)
- **No heartbeat**: Jobs with no recent activity updates
- **Auto-recovery**: Automatically reset jobs stuck for extended periods

## API Endpoints

### Get Job Progress
```http
GET /api/v1/queue/progress/{job_id}
```

Returns detailed progress information for a specific job:
```json
{
  "job_id": 12345,
  "status": "1",
  "current_step": "transcribing_audio",
  "progress_percentage": 45,
  "step_start_time": "2025-10-15T10:30:00Z",
  "last_heartbeat": "2025-10-15T10:35:00Z",
  "process_start_time": "2025-10-15T10:25:00Z",
  "video_name": "interview_video.mp4",
  "user_id": 123,
  "interview_id": 456,
  "question_id": 789
}
```

### Get All Processing Jobs
```http
GET /api/v1/queue/processing-jobs
```

Returns all currently processing jobs with their progress:
```json
{
  "processing_jobs": [
    {
      "job_id": 12345,
      "video_name": "interview_video.mp4",
      "current_step": "transcribing_audio",
      "progress_percentage": 45,
      "process_start_time": "2025-10-15T10:25:00Z",
      "step_start_time": "2025-10-15T10:30:00Z",
      "last_heartbeat": "2025-10-15T10:35:00Z"
    }
  ],
  "count": 1
}
```

### Get Stuck Jobs
```http
GET /api/v1/queue/stuck-jobs?timeout_minutes=30
```

Returns jobs that appear to be stuck:
```json
{
  "stuck_jobs": [
    {
      "job_id": 12346,
      "video_name": "stuck_video.mp4",
      "current_step": "gaze_tracking",
      "progress_percentage": 85,
      "time_since_step_minutes": 45.2,
      "time_since_heartbeat_minutes": 45.2,
      "process_start_time": "2025-10-15T09:30:00Z",
      "step_start_time": "2025-10-15T09:45:00Z",
      "last_heartbeat": "2025-10-15T09:45:00Z"
    }
  ],
  "count": 1,
  "timeout_minutes": 30
}
```

### Reset Stuck Job
```http
POST /api/v1/queue/reset-stuck-job/{job_id}?reason=Manual%20reset
```

Resets a stuck job back to pending status:
```json
{
  "status": "success",
  "job_id": 12346,
  "message": "Job 12346 has been reset to pending status",
  "reason": "Manual reset"
}
```

### Enhanced Timeout Check
```http
POST /api/v1/queue/timeout-check?timeout_hours=5&step_timeout_minutes=30
```

Performs comprehensive timeout and stuck job checking:
```json
{
  "message": "Timeout check completed",
  "result": {
    "status": "timeout_jobs_handled",
    "timeout_jobs": [],
    "jobs_reset": 0,
    "stuck_jobs_found": 1,
    "stuck_jobs_auto_reset": 1,
    "stuck_jobs": [...],
    "step_timeout_minutes": 30
  }
}
```

## Database Schema Changes

The following fields have been added to the `mip_ai_feedback_queue_tbl` table:

```sql
ALTER TABLE mip_ai_feedback_queue_tbl ADD COLUMN current_step VARCHAR(100) NULL COMMENT 'Current processing step';
ALTER TABLE mip_ai_feedback_queue_tbl ADD COLUMN progress_percentage INT DEFAULT 0 COMMENT 'Progress percentage (0-100)';
ALTER TABLE mip_ai_feedback_queue_tbl ADD COLUMN step_start_time DATETIME NULL COMMENT 'Current step start time';
ALTER TABLE mip_ai_feedback_queue_tbl ADD COLUMN last_heartbeat DATETIME NULL COMMENT 'Last activity timestamp';
```

## Usage Examples

### Monitor Job Progress
```python
# Get progress for a specific job
progress = await ProgressTracker.get_job_progress(job_id)
print(f"Job {job_id} is at {progress['progress_percentage']}% - {progress['current_step']}")
```

### Update Progress in Processing
```python
# Update job progress to a specific step
await ProgressTracker.update_progress(job_id, ProcessingStep.TRANSCRIBING_AUDIO)

# Update with custom progress percentage
await ProgressTracker.update_progress(job_id, ProcessingStep.ANALYZING_CONTENT, progress_percentage=65)

# Send heartbeat to indicate job is still alive
await ProgressTracker.heartbeat(job_id)
```

### Check for Stuck Jobs
```python
# Find jobs stuck for more than 30 minutes
stuck_jobs = await ProgressTracker.get_stuck_jobs(step_timeout_minutes=30)

# Reset a stuck job
success = await ProgressTracker.reset_stuck_job(job_id, "Job appeared stuck in gaze tracking")
```

## Monitoring and Alerting

### Recommended Monitoring
1. **Dashboard**: Monitor `/api/v1/queue/processing-jobs` for real-time job status
2. **Alerts**: Set up alerts for jobs stuck longer than expected timeouts
3. **Metrics**: Track average processing time per step for performance optimization
4. **Health Checks**: Include stuck job count in health check endpoints

### Troubleshooting Stuck Jobs

Common causes and solutions:

1. **Transcription Step Stuck**
   - **Cause**: Large audio files or Whisper model issues
   - **Solution**: Check audio file size and Whisper service health

2. **Gaze Tracking Stuck**
   - **Cause**: Complex video processing or model loading issues
   - **Solution**: Verify dlib models are loaded correctly

3. **S3 Download Stuck**
   - **Cause**: Network issues or large file sizes
   - **Solution**: Check S3 connectivity and file sizes

4. **Face Analysis Stuck**
   - **Cause**: OpenCV processing issues or corrupted video frames
   - **Solution**: Verify video file integrity and OpenCV installation

## Configuration

### Environment Variables
```bash
# Maximum time a job can be in processing status (hours)
JOB_TIMEOUT_HOURS=5

# Maximum time a job can be stuck in a single step (minutes)
STEP_TIMEOUT_MINUTES=30

# Auto-reset jobs stuck for more than double the step timeout
AUTO_RESET_STUCK_JOBS=true
```

### Background Monitoring
The system includes automatic monitoring that:
- Runs timeout checks every 30 seconds (configurable)
- Auto-resets jobs stuck for more than 2x the step timeout
- Logs all progress updates and stuck job detections
- Maintains heartbeat for active jobs

## Benefits

1. **Visibility**: Real-time insight into job processing status
2. **Reliability**: Automatic detection and recovery of stuck jobs
3. **Debugging**: Detailed step tracking helps identify bottlenecks
4. **Performance**: Monitor processing times to optimize system performance
5. **User Experience**: Users can see progress instead of waiting blindly

This progress tracking system significantly improves the reliability and observability of the AI Feedback processing pipeline.
