Please add speaker diarization support to /api/v1/audio/transcriptions, especially for elevenlabs/scribe-v2.
Current issue: Venice supports private/x402-friendly STT, but the transcription API only exposes file, model, response_format, timestamps, and language. The response schema only returns text and timestamps. There is no documented way to request speaker diarization or receive speaker IDs.
Requested API fields:
diarize: boolean
num_speakers?: number
diarization_threshold?: number
use_multi_channel?: boolean
speaker_labels?: string[]
actors?: { id: string; name: string; voice_sample?: file | url }[]
Requested response additions: { "text": "...", "segments": [ { "speaker_id": "speaker_0", "speaker_name": "Alice", "start": 0.0, "end": 3.2, "text": "..." } ], "timestamps": { "word": [ { "word": "...", "start": 0.0, "end": 0.4, "speaker_id": "speaker_0", "speaker_name": "Alice" } ] } }
Why this matters: This is important for meeting transcripts, interviews, podcasts, call transcripts, agent workflows, and media/script workflows where privacy and x402 payment are required. Existing diarization providers usually require separate accounts/API keys and do not fit Venice's privacy/accountless payment model.
Minimum useful version: Forward Scribe v2 diarization options (diarize, num_speakers, use_multi_channel) and preserve upstream speaker_id in word/segment timestamps.
Ideal version: Support known actor labels / speaker profiles so developers can map speakers to real names or roles during transcription, while preserving Venice's privacy guarantees.
Please authenticate to join the conversation.
New Submission
Feature Requests
API
1 day ago

An Anonymous User
Get notified by email when there are changes.
New Submission
Feature Requests
API
1 day ago

An Anonymous User
Get notified by email when there are changes.