Kyutai is an amazing set of Streaming Speech-To-Text and Text-To-Speech models that excel in English and French. It is permissively licensed CC-by-4.0 and has very strong performance.
Venice already has Text-To-Speech, which is quite good, but is missing streaming Speech-To-Text and it would be amazing to have a privacy-focused Speech-To-Text model.
Links:
Please authenticate to join the conversation.
Backlog
Feature Requests
Voice
11 months ago

Nicolas Embleton
Get notified by email when there are changes.
Backlog
Feature Requests
Voice
11 months ago

Nicolas Embleton
Get notified by email when there are changes.