Automated Call Transcription and Diarization

NeuronLens converts your call recordings into clean, structured text — with speaker labels, word-level timestamps, confidence scores, and a plain-language summary for every call. You do not need to pre-configure custom vocabulary or acoustic models for standard use cases. Submit an audio file, specify the language and features you want, and NeuronLens returns a complete transcript with everything your analytics, QA, and compliance pipelines need.

What Transcription Returns

Each completed transcription job gives you:

Speaker-Labeled Transcript

Every segment is tagged with the speaker role — agent or customer — along with start time, end time, and a confidence score.

Smart Summary

A 2–3 sentence plain-language summary of the call: what was discussed, what the customer’s situation was, and what the outcome or next step is.

Sentiment Analysis

Overall call sentiment (positive, neutral, negative) with per-segment sentiment scores so you can see how the conversation evolved.

Intent Classification

A structured intent label — interested, callback requested, not interested, complaint, escalation — derived from the full conversation context.

Speaker Diarization

NeuronLens automatically separates and labels the two sides of a call. You do not need to submit separate audio channels — the diarization model identifies who is speaking based on acoustic patterns and conversational structure. Pass "speakers": 2 in your request to signal a two-party call. If your recording includes a conference or a three-way call, set the value accordingly.

For best diarization accuracy on single-channel (mono) recordings, ensure the audio is at least 8 kHz sample rate. Stereo recordings with agent and customer on separate channels produce the most precise speaker separation.

Submitting a Recording

Send a POST request to /v1/transcription with the audio URL and the features you want to enable.

curl -X POST https://api.vinfer.ai/v1/transcription \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://your-storage.com/calls/call_20240115_001.wav",
    "language": "hi-IN",
    "speakers": 2,
    "features": ["transcription", "summary", "sentiment", "intent"]
  }'

Request Parameters

audio_url

string

required

A publicly accessible URL or a signed URL pointing to your audio file. NeuronLens fetches the file at processing time and does not store your audio beyond that window.

language

string

required

BCP-47 language code for the primary language spoken in the call. For example, hi-IN for Hindi, en-IN for Indian English, ta-IN for Tamil. See the supported languages table below.

speakers

integer

default:"2"

Number of distinct speakers in the recording. For standard two-party agent–customer calls, use 2.

features

array

required

List of analysis features to run. Accepted values: "transcription", "summary", "sentiment", "intent". Pass all four to get the full analysis in a single job.

webhook_url

string

Optional. If provided, NeuronLens posts the completed result to this URL when processing finishes, so you do not need to poll.

metadata

object

Optional. A free-form key-value object you can attach to the job for your own reference — for example, { "agent_id": "ag_001", "campaign_id": "camp_xyz" }.

Initial Response

NeuronLens returns a job object immediately. Processing happens asynchronously.

{
  "job_id": "job_7f3a91bc",
  "status": "pending",
  "created_at": "2024-01-15T09:22:00Z",
  "estimated_completion_seconds": 45
}

job_id

string

Unique identifier for this transcription job. Use it to fetch results.

status

string

Current job state: pending, processing, or completed. A failed status includes an error field with a reason.

estimated_completion_seconds

integer

Approximate seconds until the job completes. Actual time depends on audio duration and current queue depth.

Fetching Results

Poll the job status endpoint until status is completed, or wait for your webhook callback.

curl "https://api.vinfer.ai/v1/transcription/job_7f3a91bc" \
  -H "Authorization: Bearer YOUR_API_KEY"

Completed Response

{
  "job_id": "job_7f3a91bc",
  "status": "completed",
  "duration_seconds": 187,
  "language": "hi-IN",
  "summary": "The agent contacted the customer regarding an overdue EMI payment. The customer acknowledged the outstanding amount and agreed to pay by 18th January. The agent confirmed the payment link would be sent via SMS.",
  "sentiment": {
    "overall": "positive",
    "score": 0.74,
    "breakdown": {
      "positive": 0.74,
      "neutral": 0.21,
      "negative": 0.05
    }
  },
  "intent": {
    "label": "callback_agreed",
    "confidence": 0.91
  },
  "transcript": [
    {
      "speaker": "agent",
      "text": "Good morning, am I speaking with Ramesh Iyer?",
      "start_time": 0.4,
      "end_time": 2.8,
      "confidence": 0.97
    },
    {
      "speaker": "customer",
      "text": "Yes, speaking.",
      "start_time": 3.1,
      "end_time": 4.0,
      "confidence": 0.99
    },
    {
      "speaker": "agent",
      "text": "Good morning, Mr. Iyer. This is Priya calling from VInfer Financial Services. I'm calling regarding your personal loan account ending in 4821. You have an EMI of rupees fourteen thousand two hundred that was due on the tenth of January. I wanted to check if you have had a chance to arrange the payment.",
      "start_time": 4.3,
      "end_time": 22.1,
      "confidence": 0.95
    },
    {
      "speaker": "customer",
      "text": "Yes, I know about it. I've been a bit busy this week. Can I pay by the eighteenth?",
      "start_time": 22.8,
      "end_time": 28.4,
      "confidence": 0.96
    },
    {
      "speaker": "agent",
      "text": "Of course, Mr. Iyer. I can note a commitment to pay by the eighteenth of January. I'll send you a payment link on your registered mobile number right after this call so you can pay via UPI or net banking at your convenience.",
      "start_time": 29.0,
      "end_time": 40.6,
      "confidence": 0.98
    },
    {
      "speaker": "customer",
      "text": "Okay, that works. Please send it on WhatsApp if possible.",
      "start_time": 41.1,
      "end_time": 44.8,
      "confidence": 0.97
    },
    {
      "speaker": "agent",
      "text": "I'll make a note of that. Is there anything else I can help you with today?",
      "start_time": 45.2,
      "end_time": 48.9,
      "confidence": 0.99
    },
    {
      "speaker": "customer",
      "text": "No, that's all. Thank you.",
      "start_time": 49.3,
      "end_time": 51.0,
      "confidence": 0.99
    }
  ]
}

summary

string

Plain-language 2–3 sentence summary of the call outcome.

sentiment.overall

string

Top-level sentiment: positive, neutral, or negative.

intent.label

string

Classified customer intent for the call.

transcript

array

Ordered array of speech segments. Each segment includes speaker, text, start_time, end_time, and confidence.

Bulk Submission

To submit multiple recordings at once, use the batch endpoint:

curl -X POST https://api.vinfer.ai/v1/transcription/batch \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "jobs": [
      {
        "audio_url": "https://your-storage.com/calls/call_001.wav",
        "language": "en-IN",
        "speakers": 2,
        "features": ["transcription", "summary", "sentiment", "intent"]
      },
      {
        "audio_url": "https://your-storage.com/calls/call_002.mp3",
        "language": "ta-IN",
        "speakers": 2,
        "features": ["transcription", "summary", "sentiment"]
      }
    ],
    "webhook_url": "https://your-server.com/webhooks/neuronlens"
  }'

The batch endpoint returns a batch_id and an array of individual job_id values. Results are delivered per job to your webhook URL as each one completes.

Enable all four features — "transcription", "summary", "sentiment", and "intent" — in a single request. Running them together is more efficient than submitting separate jobs, and it ensures all analysis is derived from the same processing pass.

Supported Audio Formats

Format	Extension	Notes
WAV	`.wav`	Recommended; supports PCM and compressed variants
MP3	`.mp3`	Widely supported; slight quality trade-off vs. WAV
OGG Vorbis	`.ogg`	Common in WebRTC-based recording setups
FLAC	`.flac`	Lossless; larger file size
MPEG-4 Audio	`.m4a`	Common in mobile recording applications

Limits: Maximum file size 500 MB · Maximum audio duration 4 hours per job.

Supported Languages

Audio files must be accessible via a public URL or a time-limited signed URL (minimum 15 minutes validity). NeuronLens fetches the file once during processing and does not store your audio beyond that window. Make sure your storage bucket does not require IP allowlisting that would block VInfer’s processing servers.

Language	Code	Notes
Hindi	`hi-IN`	Including Hinglish (code-switched Hindi–English)
Indian English	`en-IN`	Tuned for Indian accents across regions
Tamil	`ta-IN`
Telugu	`te-IN`
Marathi	`mr-IN`
Bengali	`bn-IN`
Kannada	`kn-IN`
Gujarati	`gu-IN`
Malayalam	`ml-IN`
Punjabi	`pa-IN`
Standard English	`en-US`	For calls with non-Indian participants

Use the language code that matches the primary spoken language in the call. For heavily code-switched calls, choose the dominant language and NeuronLens will handle the mixed segments automatically.

​What Transcription Returns

Speaker-Labeled Transcript

Smart Summary

Sentiment Analysis

Intent Classification

​Speaker Diarization

​Submitting a Recording

​Request Parameters

​Initial Response

​Fetching Results

​Completed Response

​Bulk Submission

​Supported Audio Formats

​Supported Languages

What Transcription Returns

Speaker Diarization

Submitting a Recording

Request Parameters

Initial Response

Fetching Results

Completed Response

Bulk Submission

Supported Audio Formats

Supported Languages