Skip to main content
NeuronLens converts your call recordings into clean, structured text — with speaker labels, word-level timestamps, confidence scores, and a plain-language summary for every call. You do not need to pre-configure custom vocabulary or acoustic models for standard use cases. Submit an audio file, specify the language and features you want, and NeuronLens returns a complete transcript with everything your analytics, QA, and compliance pipelines need.

What Transcription Returns

Each completed transcription job gives you:

Speaker-Labeled Transcript

Every segment is tagged with the speaker role — agent or customer — along with start time, end time, and a confidence score.

Smart Summary

A 2–3 sentence plain-language summary of the call: what was discussed, what the customer’s situation was, and what the outcome or next step is.

Sentiment Analysis

Overall call sentiment (positive, neutral, negative) with per-segment sentiment scores so you can see how the conversation evolved.

Intent Classification

A structured intent label — interested, callback requested, not interested, complaint, escalation — derived from the full conversation context.

Speaker Diarization

NeuronLens automatically separates and labels the two sides of a call. You do not need to submit separate audio channels — the diarization model identifies who is speaking based on acoustic patterns and conversational structure. Pass "speakers": 2 in your request to signal a two-party call. If your recording includes a conference or a three-way call, set the value accordingly.
For best diarization accuracy on single-channel (mono) recordings, ensure the audio is at least 8 kHz sample rate. Stereo recordings with agent and customer on separate channels produce the most precise speaker separation.

Submitting a Recording

Send a POST request to /v1/transcription with the audio URL and the features you want to enable.
curl -X POST https://api.vinfer.ai/v1/transcription \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://your-storage.com/calls/call_20240115_001.wav",
    "language": "hi-IN",
    "speakers": 2,
    "features": ["transcription", "summary", "sentiment", "intent"]
  }'

Request Parameters

audio_url
string
required
A publicly accessible URL or a signed URL pointing to your audio file. NeuronLens fetches the file at processing time and does not store your audio beyond that window.
language
string
required
BCP-47 language code for the primary language spoken in the call. For example, hi-IN for Hindi, en-IN for Indian English, ta-IN for Tamil. See the supported languages table below.
speakers
integer
default:"2"
Number of distinct speakers in the recording. For standard two-party agent–customer calls, use 2.
features
array
required
List of analysis features to run. Accepted values: "transcription", "summary", "sentiment", "intent". Pass all four to get the full analysis in a single job.
webhook_url
string
Optional. If provided, NeuronLens posts the completed result to this URL when processing finishes, so you do not need to poll.
metadata
object
Optional. A free-form key-value object you can attach to the job for your own reference — for example, { "agent_id": "ag_001", "campaign_id": "camp_xyz" }.

Initial Response

NeuronLens returns a job object immediately. Processing happens asynchronously.
{
  "job_id": "job_7f3a91bc",
  "status": "pending",
  "created_at": "2024-01-15T09:22:00Z",
  "estimated_completion_seconds": 45
}
job_id
string
Unique identifier for this transcription job. Use it to fetch results.
status
string
Current job state: pending, processing, or completed. A failed status includes an error field with a reason.
estimated_completion_seconds
integer
Approximate seconds until the job completes. Actual time depends on audio duration and current queue depth.

Fetching Results

Poll the job status endpoint until status is completed, or wait for your webhook callback.
curl "https://api.vinfer.ai/v1/transcription/job_7f3a91bc" \
  -H "Authorization: Bearer YOUR_API_KEY"

Completed Response

{
  "job_id": "job_7f3a91bc",
  "status": "completed",
  "duration_seconds": 187,
  "language": "hi-IN",
  "summary": "The agent contacted the customer regarding an overdue EMI payment. The customer acknowledged the outstanding amount and agreed to pay by 18th January. The agent confirmed the payment link would be sent via SMS.",
  "sentiment": {
    "overall": "positive",
    "score": 0.74,
    "breakdown": {
      "positive": 0.74,
      "neutral": 0.21,
      "negative": 0.05
    }
  },
  "intent": {
    "label": "callback_agreed",
    "confidence": 0.91
  },
  "transcript": [
    {
      "speaker": "agent",
      "text": "Good morning, am I speaking with Ramesh Iyer?",
      "start_time": 0.4,
      "end_time": 2.8,
      "confidence": 0.97
    },
    {
      "speaker": "customer",
      "text": "Yes, speaking.",
      "start_time": 3.1,
      "end_time": 4.0,
      "confidence": 0.99
    },
    {
      "speaker": "agent",
      "text": "Good morning, Mr. Iyer. This is Priya calling from VInfer Financial Services. I'm calling regarding your personal loan account ending in 4821. You have an EMI of rupees fourteen thousand two hundred that was due on the tenth of January. I wanted to check if you have had a chance to arrange the payment.",
      "start_time": 4.3,
      "end_time": 22.1,
      "confidence": 0.95
    },
    {
      "speaker": "customer",
      "text": "Yes, I know about it. I've been a bit busy this week. Can I pay by the eighteenth?",
      "start_time": 22.8,
      "end_time": 28.4,
      "confidence": 0.96
    },
    {
      "speaker": "agent",
      "text": "Of course, Mr. Iyer. I can note a commitment to pay by the eighteenth of January. I'll send you a payment link on your registered mobile number right after this call so you can pay via UPI or net banking at your convenience.",
      "start_time": 29.0,
      "end_time": 40.6,
      "confidence": 0.98
    },
    {
      "speaker": "customer",
      "text": "Okay, that works. Please send it on WhatsApp if possible.",
      "start_time": 41.1,
      "end_time": 44.8,
      "confidence": 0.97
    },
    {
      "speaker": "agent",
      "text": "I'll make a note of that. Is there anything else I can help you with today?",
      "start_time": 45.2,
      "end_time": 48.9,
      "confidence": 0.99
    },
    {
      "speaker": "customer",
      "text": "No, that's all. Thank you.",
      "start_time": 49.3,
      "end_time": 51.0,
      "confidence": 0.99
    }
  ]
}
summary
string
Plain-language 2–3 sentence summary of the call outcome.
sentiment.overall
string
Top-level sentiment: positive, neutral, or negative.
intent.label
string
Classified customer intent for the call.
transcript
array
Ordered array of speech segments. Each segment includes speaker, text, start_time, end_time, and confidence.

Bulk Submission

To submit multiple recordings at once, use the batch endpoint:
curl -X POST https://api.vinfer.ai/v1/transcription/batch \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "jobs": [
      {
        "audio_url": "https://your-storage.com/calls/call_001.wav",
        "language": "en-IN",
        "speakers": 2,
        "features": ["transcription", "summary", "sentiment", "intent"]
      },
      {
        "audio_url": "https://your-storage.com/calls/call_002.mp3",
        "language": "ta-IN",
        "speakers": 2,
        "features": ["transcription", "summary", "sentiment"]
      }
    ],
    "webhook_url": "https://your-server.com/webhooks/neuronlens"
  }'
The batch endpoint returns a batch_id and an array of individual job_id values. Results are delivered per job to your webhook URL as each one completes.
Enable all four features — "transcription", "summary", "sentiment", and "intent" — in a single request. Running them together is more efficient than submitting separate jobs, and it ensures all analysis is derived from the same processing pass.

Supported Audio Formats

FormatExtensionNotes
WAV.wavRecommended; supports PCM and compressed variants
MP3.mp3Widely supported; slight quality trade-off vs. WAV
OGG Vorbis.oggCommon in WebRTC-based recording setups
FLAC.flacLossless; larger file size
MPEG-4 Audio.m4aCommon in mobile recording applications
Limits: Maximum file size 500 MB · Maximum audio duration 4 hours per job.

Supported Languages

Audio files must be accessible via a public URL or a time-limited signed URL (minimum 15 minutes validity). NeuronLens fetches the file once during processing and does not store your audio beyond that window. Make sure your storage bucket does not require IP allowlisting that would block VInfer’s processing servers.
LanguageCodeNotes
Hindihi-INIncluding Hinglish (code-switched Hindi–English)
Indian Englishen-INTuned for Indian accents across regions
Tamilta-IN
Telugute-IN
Marathimr-IN
Bengalibn-IN
Kannadakn-IN
Gujaratigu-IN
Malayalamml-IN
Punjabipa-IN
Standard Englishen-USFor calls with non-Indian participants
Use the language code that matches the primary spoken language in the call. For heavily code-switched calls, choose the dominant language and NeuronLens will handle the mixed segments automatically.