Add speaker diarization and actor labels to Speech-to-Text API

Please add speaker diarization support to /api/v1/audio/transcriptions, especially for elevenlabs/scribe-v2.

Current issue: Venice supports private/x402-friendly STT, but the transcription API only exposes file, model, response_format, timestamps, and language. The response schema only returns text and timestamps. There is no documented way to request speaker diarization or receive speaker IDs.

Requested API fields:

  • diarize: boolean

  • num_speakers?: number

  • diarization_threshold?: number

  • use_multi_channel?: boolean

  • speaker_labels?: string[]

  • actors?: { id: string; name: string; voice_sample?: file | url }[]

Requested response additions: { "text": "...", "segments": [ { "speaker_id": "speaker_0", "speaker_name": "Alice", "start": 0.0, "end": 3.2, "text": "..." } ], "timestamps": { "word": [ { "word": "...", "start": 0.0, "end": 0.4, "speaker_id": "speaker_0", "speaker_name": "Alice" } ] } }

Why this matters: This is important for meeting transcripts, interviews, podcasts, call transcripts, agent workflows, and media/script workflows where privacy and x402 payment are required. Existing diarization providers usually require separate accounts/API keys and do not fit Venice's privacy/accountless payment model.

Minimum useful version: Forward Scribe v2 diarization options (diarize, num_speakers, use_multi_channel) and preserve upstream speaker_id in word/segment timestamps.

Ideal version: Support known actor labels / speaker profiles so developers can map speakers to real names or roles during transcription, while preserving Venice's privacy guarantees.

Please authenticate to join the conversation.

Upvoters
Status

New Submission

Board
💡

Feature Requests

Tags

API

Date

1 day ago

Author

An Anonymous User

Subscribe to post

Get notified by email when there are changes.