May 6th, 2026

xAI's most intelligent reasoning model is now generally available on Venice. 1M-token context window, function calling, structured outputs, and multimodal support.
Realtime voice conversations are now live on Venice. Talk to any model with memory sync, chat persistence, waveform visualization, push-to-talk input, and language switching. Now available on web, iOS and Android.
OpenAI's latest-generation model family is now available on Venice. GPT-5.5 delivers improved reasoning, stronger instruction-following, and better multi-turn conversation across the board. GPT-5.5 Pro adds extended reasoning depth and a larger context window for demanding workloads. Both models are available now.
Kuaishou's Kling V3 and O3 video models now generate native 4K output on Venice. Available in text-to-video, image-to-video, and reference-to-video modes, Kling 4K delivers sharper detail, better motion coherence, and cinematic-quality output at four times the resolution of previous generations.
Venice has increased the programmatic burns for new subscriptions: $2 in VVV for Pro, $5 in VVV for Pro+, and $10 in VVV for Max. Every new subscription triggers a buy-and-burn at these updated amounts.
The following models have been added to Venice:
Grok 4.3 — xAI's most intelligent reasoning model with 1M-token context window, function calling, structured outputs, and multimodal support. Available to all users.
GPT-5.5 — OpenAI's latest-generation text model with improved reasoning, instruction-following, and multi-turn conversation. Available to all users.
GPT-5.5 Pro — OpenAI's higher-capability variant of GPT-5.5 with extended reasoning depth and larger context window. Pro users only.
DeepSeek V4 Pro — DeepSeek's full-size V4 reasoning model with extended context and strong performance on coding, math, and multi-step tasks. Available to all users.
DeepSeek V4 Flash — Lighter, faster variant of DeepSeek V4 optimized for speed and lower latency while retaining strong general-purpose performance. Available to all users.
Qwen 3.6 27B — Text model from Alibaba Cloud with 27 billion parameters, offering a balance of capability and efficiency with 128K context window. Available to all users.
GLM 5.1 E2EE — Zhipu AI's GLM 5.1 running with end-to-end encryption in a Trusted Execution Environment. Available to Pro users at no additional credit cost.
Kling V3 4K — Kuaishou text-to-video at native 4K resolution. Available to all users.
Kling V3 4K R2V — Kuaishou reference-to-video at native 4K resolution. Available to all users.
Kling O3 4K — Kuaishou O3-series text-to-video at native 4K resolution. Available to all users.
Kling O3 4K I2V — Kuaishou O3-series image-to-video at native 4K resolution. Available to all users.
Kling O3 4K R2V — Kuaishou O3-series reference-to-video at native 4K resolution. Available to all users.
HappyHorse 1.0 — Alibaba's text-to-video generation model. Available to all users.
HappyHorse 1.0 I2V — Image-to-video generation from a source image. Available to all users.
HappyHorse 1.0 Reference — Video generation guided by a reference image for style and content. Available to all users.
HappyHorse 1.0 Edit — Video editing model for modifying and transforming existing video. Available to all users.
Wan 2.7 Pro Edit — Alibaba DashScope image editing model for prompt-driven edits to existing images. Available to all users.
Model Explorer Redesign — Refreshed layout for the Model Explorer with improved navigation and filtering.
Recommended Model Sort — New "Recommended" sort option in the model selector, prioritizing recently used models.
Model Details Modal — Model details can now be opened directly via URL in a dedicated modal.
Model Explorer Switcher — New entry point in the model switcher to navigate directly to the Model Explorer.
Prompt Enhancement Context — The prompt enhancement wand now incorporates conversation context when rewriting prompts.
Video Auto-Compression — Oversized videos are automatically compressed client-side before upload.
Per-Class PPU Toggles — Pay-per-use confirmation can now be toggled independently for each model class in chat.
Batch Delete Warning — Batch chat delete confirmation now warns that chats will be removed from other devices too.
Select All in Chat Delete — Added "Select All" option to the chat sidebar delete menu.
Image Auto-Downsize on Share — Images larger than 25 MB are automatically downsized before sharing.
Adaptive Thinking Always On — Removed the adaptive thinking toggle. Adaptive thinking is now always enabled.
Burn Type Tooltips — Tooltips now vary by burn type, with "Bought" label shown for discretionary burns.
China Server Location Flag — China flag icon now displayed for CN server locations in model details.
Sidebar Cleanup — Removed Help & Feedback button from the sidebar app menu.
PPU Confirmation Popup — Confirmation popup now shown when a pay-per-use model is routed.
Tool Call Loading Indicator — A loading spinner now appears in agentic chat while waiting for the next tool to execute.
Unified Chat History — All v1 and v2 chat history now appears in a single combined list in the sidebar.
Rate Limit Banner — A banner now appears in the chat input area when you've hit a rate limit.
Time Sent in Info Panels — Text, image, and video info panels now display a "Time Sent" row.
Cost Management Charts — Charts on the cost management dashboard now include today's spending data.
Wide Screen Layout — Improved 2-column grid layout on wide screens for better use of available space.
Today's Spend Card — New summary card on the cost management dashboard showing today's total spend.
Chat Performance — Conversation window now uses lazy rendering for off-screen messages, reducing lag in long conversations.
Insufficient Credits Banner — Low credit warnings now appear as a dismissible banner above the input field instead of blocking interaction.
x402 Wallet View — Added a dedicated wallet view and admin top-up panel on the user page for x402 balances.
Voice Conversation Billing — Audio duration is now tracked per voice conversation session for accurate credit billing.
Video Credit Retry — Video generation credits are now automatically retried when an initial charge amount fails.
Android Voice Mode — Voice mode is now available on Android, with a prompt to update to the latest app version.
Uncensored Model Badges — Video model selectors now display an "Uncensored" badge where applicable.
Wallet Connect on Sign-In — Crypto wallet connection is now available on the sign-in and sign-up screens.
Pay-Per-Use in Chat — Pay-per-use purchase dialog added to the chat screen.
Pay-Per-Use Confirmation — Added a confirmation step before completing pay-per-use purchases.
iOS Native Chat Streaming — Chat responses now stream using native iOS processing.
Android Native Chat Streaming — Chat responses now stream using native Android processing.
Background Chat Sync — Chat responses that streamed while the app was backgrounded sync upon returning to the foreground.
Tablet Image Modal — Image detail modal now uses a tablet-optimized layout.
Tablet Dialogs — Dialogs now adapt to tablet screen sizes.
Tablet Settings Layout — Settings screens support split-screen and tablet-optimized layouts.
Tablet Modal Screens — Modal presentation screens now adapt to tablet screen sizes.
Dynamic Image Sizing — Images now resize dynamically based on device orientation.
Settings Navigation — Fixed navigation behavior and renamed settings screens.
Rate Limit Display — Updated rate limit information in settings.
Image & Video Info Sizing — Fixed sizing on image and video detail screens.
Privacy Warning Layout — Improved button positioning on the privacy warning dialog.
Conversation Replay Fix — Fixed a bug where already-read responses would replay when re-entering a conversation.
Android Chat Reliability — Fixed chat dropping or failing during request timeouts and mid-stream disconnects on Android.
iOS Background Image Generation — Fixed image generation failing when the app is in the background on iOS.
Android Background Image Generation — Image generation now continues running when the app is in the background on Android.
Text File Chat Sharing — Restored the ability to share chat conversations as text files.
Image Loading Indicator — Progress border on the image loader now waits briefly before appearing to avoid flicker on fast loads.
Image Error Display — Image generation errors now appear inline within chat messages.
Pro Upgrade Prompt — Restored the Pro upgrade button in the app header.
Default Playback Speed — Changed the default text-to-speech playback speed to 1.2x.
Auto Mode Image Editing — Auto mode now supports editing images referenced in the chat conversation.
Venice Skills GitHub Repository — Official veniceai/skills repository now live on GitHub with example skills covering the full Venice API surface.
Voice Cloning API — New POST /v1/audio/voices endpoint for MiniMax-based voice cloning.
OpenAI-Compatible File Inputs — Chat completions endpoint now accepts file inputs using the OpenAI-compatible format.
Model Overloaded Status Code — Model overloaded errors now return HTTP 429 instead of 503.
maxtokens Strict Cap on Reasoning Models — On reasoning-capable models, maxtokens is now a strict cap on total completion tokens (visible output + reasoning), restoring Venice's prior behavior across the model fleet. maxcompletiontokens is accepted as an equivalent alias and takes precedence if both are sent.
API File Inputs GA — File input support in the API is now generally available, no longer in preview.
Context Length in /v1/models — New context_length field added to each model object in /v1/models responses.
Free User Rate Limit CTA — Free users now see a call-to-action prompt when they hit rate limits.
Voice Rate Limit Headers — Voice agent responses now report the current rate limit and reset time to connected clients.
Qwen Image Deprecation — The qwen-image model has been deprecated and removed from both the app and the API.
Image Edit Resolution Parameter — New resolution parameter available on the image edit and multi-edit API endpoints.
Voice Mode Quota — The API now returns the caller's remaining voice mode quota in responses.
Disabled API Tier — Added a "Disabled" API consumption tier that blocks API access for the account.
Chatterbox HD on /models — Chatterbox HD voice cloning model is now listed and documented on the /models endpoint.
Per-Model Daily Costs — The Activity API now returns daily cost breakdowns per model.
Hermes Agent Integration — Official Venice integration guide for Hermes Agent, the open-source self-hosted AI agent by Nous Research. Point Hermes at the Venice API for access to 230+ models across text, image, video, audio, and embeddings with persistent memory and autonomous skill creation.
Programmatic Burn Increase — Venice increased the programmatic VVV burn for new subscriptions: $2 for Pro, $5 for Pro+, and $10 for Max. Every new subscription now triggers a larger automatic token burn.
Emissions Reduction — Venice completed the first of three planned emissions reductions for VVV, reducing the rate of new token issuance from 6M/yr to 5M/yr. Additional reductions planned in June and July.
Kimi K2 Thinking — Retired. Traffic routed to Kimi K2.5 via alias. Existing API requests using kimi-k2-thinking now resolve to kimi-k2-5
Qwen3 Coder 480B — Deprecated April 30, fully retired May 4. Traffic routed to Qwen3 Coder 480B Turbo. The non-turbo variant is no longer visible in API or app
Venice Uncensored 1.1 — Retired. All traffic routed to Venice Uncensored 1.2. API requests using venice-uncensored transparently resolve to 1.2
HiDream — Deprecation date extended to May 7, 2026 (from May 1). Email sent to affected API users
NEAR AI GLM 5.0 (E2EE) — Retired. All traffic routed to GLM 5.1 (E2EE)
Improved inpainting progress animation to reflect actual model processing time
Fixed app menu being clipped in landscape mode on iPad Safari
Updated execution time display to show milliseconds
Fixed gallery header action buttons being clipped on narrow viewports
Fixed thinking indicator disappearing during reasoning-only streaming
Removed incomplete trailing bucket from Per Period volume chart
Updated PPU model acknowledgment to trigger once per account instead of per conversation
Fixed model search returning unrelated results via subsequence matches on description and use case
Updated PPU acknowledgment to trigger once per conversation for every PPU modality
Condensed the x402 wallet balance table from 6 columns to 3
Removed the automatic greeting sent when opening a voice websocket connection
Fixed inpaint auto mode behavior after a recent regression
Improved Hunyuan 3D results to render GLB and OBJ mesh outputs directly in the viewer
Fixed rate limiting not being correctly applied to background removal and upscale for free-tier users
Improved error alert positioning and added a retry button for failed messages
Fixed incorrect provider names displayed in the model explorer
Fixed incorrect label displayed for vision models
Improved agentic mode loading indicator with an animated gradient border
Fixed audio crackling caused by inconsistent sample rate
Fixed Max button rounding instead of preserving full numerical precision
Fixed auto-enhance preference not being respected during image generation
Updated copy on the Pro upgrade call-to-action
Improved Model Selector layout by pinning the View All Models button to the bottom of the dropdown
Fixed aspect ratio selector appearing during single-image edits with Grok
Fixed moderate post modal closing when the context menu is dismissed
Improved reordered items in the user dropdown menu
Fixed arrow key navigation in image zoom following incorrect left/right order
Fixed credit balance not updating immediately after completing a chat request
Improved rendering performance for long conversations
Fixed Spotlight Search not respecting the top safe-area inset on PWA
Restored Lustify v7 model availability after prior deprecation
Fixed missing API keys silently returning empty results instead of an error
Fixed an error occurring when quoting video content in conversations
Improved image search results with lightbox preview, context menu support, and better error handling
Fixed chat message queue issues that could cause messages to be processed incorrectly
Improved context window handling with more accurate token counting, cost display tooltips, and smarter message compaction
Fixed interactions not responding correctly in the Model Explorer
Fixed temperature warning displaying at an incorrect baseline threshold
Fixed inability to send messages containing only an attachment without text
Fixed errors when using Grok 4.1 Fast with characters