Skip to main content
NeuronLens processes your call recordings asynchronously — you submit an audio file URL, choose which analysis features to run, and poll for results when they are ready. A single job can produce a speaker-diarized transcript, a plain-language summary, per-speaker sentiment scores, detected intent, and a QA evaluation against your scorecard — all from one API call.

Submit a Transcription Job

POST /transcription Submits a call recording for processing. Returns a job_id immediately; use it to poll for results.

Request Body

audio_url
string
required
A publicly accessible URL pointing to the audio file to transcribe. You can use a pre-signed URL from cloud storage (S3, GCS, Azure Blob). The URL must remain accessible for at least 15 minutes after submission.
language
string
required
BCP-47 language code for the recording. This determines the speech recognition model. Examples: hi-IN, ta-IN, en-IN, te-IN, mr-IN.
speakers
integer
default:"2"
Expected number of speakers in the recording for diarization. Accepted values: 1 to 10. For most call recordings, the default of 2 (agent + customer) is correct.
features
array
default:"[\"transcription\"]"
List of analysis features to run on the recording. Including more features increases processing time slightly. Available values:
  • transcription — speaker-diarized, timestamped speech-to-text (always included)
  • summary — a plain-language summary of the call (2-4 sentences)
  • sentiment — sentiment scores per speaker and overall (-1 to 1 scale)
  • intent — primary customer intent detected from the conversation
  • qa_scoring — evaluate the call against a QA scorecard (requires qa_scorecard_id)
qa_scorecard_id
string
The ID of the QA scorecard to evaluate against. Required when qa_scoring is included in features. Scorecards are created and managed in the NeuronLens dashboard under QA → Scorecards.
metadata
object
Optional key-value pairs attached to the job. These are passed through unchanged to all response and webhook payloads — useful for correlating jobs with records in your own system. For example: {"crm_ticket_id": "TKT-4492", "agent_id": "agt_001"}.

Example Request

curl https://api.vinfer.ai/v1/transcription \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://storage.example.com/recordings/call_9Hm3kP7qZ.wav?signed=...",
    "language": "hi-IN",
    "speakers": 2,
    "features": ["transcription", "summary", "sentiment", "qa_scoring"],
    "qa_scorecard_id": "qsc_7Lb2pN5kR",
    "metadata": {
      "crm_ticket_id": "TKT-4492",
      "agent_id": "agt_001",
      "call_id": "cal_9Hm3kP7qZ"
    }
  }'

Example Response

{
  "job_id": "job_2Kn7wR4pM",
  "status": "pending",
  "created_at": "2024-02-01T10:00:00Z"
}
Processing time depends on audio duration and the features requested. A typical 3-minute call with all features enabled completes in under 60 seconds. Long recordings (30+ minutes) may take a few minutes. Poll GET /transcription/{job_id} or listen for a transcription.completed webhook event.

Get Job Status and Results

GET /transcription/{job_id} Poll this endpoint to check the status of a submitted job and retrieve results once processing is complete.

Response Fields

job_id
string
The unique identifier for this transcription job.
status
string
Current job status: pending (queued), processing (actively being analyzed), completed (results available), or failed (processing error — check error field for details).
transcript
array
Array of transcript segments, each with speaker label and timing. Present only when status is completed.
summary
string
A 2-4 sentence plain-language summary of the call. Present when summary was included in features. null otherwise.
sentiment
object
Sentiment analysis results. Present when sentiment was included in features.
intent
string
The primary customer intent detected from the conversation. Examples: loan_renewal_interest, complaint, payment_query, dnd_request. Present when intent was included in features. null otherwise.
qa_score
object
QA evaluation results. Present when qa_scoring was included in features.
metadata
object
The metadata object you submitted with the job, returned unchanged.
created_at
string
ISO 8601 timestamp of when the job was submitted.
completed_at
string
ISO 8601 timestamp of when processing finished. null if still in progress.

Example Response (Completed Job)

{
  "job_id": "job_2Kn7wR4pM",
  "status": "completed",
  "language": "hi-IN",
  "created_at": "2024-02-01T10:00:00Z",
  "completed_at": "2024-02-01T10:00:52Z",
  "transcript": [
    {
      "speaker": "agent",
      "text": "Namaste, Priya ji. Main VInfer ki taraf se baat kar raha hoon. Kya aap abhi baat kar sakte hain?",
      "start_time": 0.4,
      "end_time": 5.1,
      "confidence": 0.97
    },
    {
      "speaker": "customer",
      "text": "Haan, bataiye.",
      "start_time": 5.9,
      "end_time": 7.2,
      "confidence": 0.95
    },
    {
      "speaker": "agent",
      "text": "Aapka loan renewal ka time aa gaya hai. Kya aap is baare mein baat karna chahenge?",
      "start_time": 7.6,
      "end_time": 13.0,
      "confidence": 0.96
    },
    {
      "speaker": "customer",
      "text": "Haan, mujhe kisi se baat karni hai. Please mujhe connect karein.",
      "start_time": 13.5,
      "end_time": 18.2,
      "confidence": 0.94
    }
  ],
  "summary": "The agent contacted Priya Sharma regarding loan renewal. The customer expressed interest and requested to be connected with a human agent to discuss options.",
  "sentiment": {
    "overall": 0.42,
    "by_speaker": {
      "agent": 0.65,
      "customer": 0.38
    }
  },
  "intent": "loan_renewal_interest",
  "qa_score": {
    "overall_score": 87.5,
    "parameter_scores": [
      {"parameter_name": "Greeting Compliance", "score": 10, "max_score": 10, "passed": true},
      {"parameter_name": "Product Pitch Accuracy", "score": 15, "max_score": 20, "passed": false},
      {"parameter_name": "DND Handling", "score": 10, "max_score": 10, "passed": true},
      {"parameter_name": "Escalation Protocol", "score": 10, "max_score": 10, "passed": true}
    ]
  },
  "metadata": {
    "crm_ticket_id": "TKT-4492",
    "agent_id": "agt_001",
    "call_id": "cal_9Hm3kP7qZ"
  }
}

Supported Audio Formats

NeuronLens accepts the following audio formats:
FormatExtensionNotes
WAV.wavRecommended for best accuracy. Uncompressed PCM preferred.
MP3.mp3Common for telephony recordings.
OGG.oggOgg Vorbis and Ogg Opus both supported.
FLAC.flacLossless — good accuracy, larger file size.
M4A.m4aAAC audio in MPEG-4 container.
File size limit: 500 MB per submission. Duration limit: 4 hours per submission.
For best transcription accuracy, use recordings with a sample rate of 8 kHz or higher. Telephony recordings at 8 kHz (standard PSTN quality) work well. Stereo recordings where the agent and customer are on separate channels will produce the most accurate diarization — if you have separate-channel recordings, consider indicating "speakers": 2 explicitly.