Hi, I'm trying out your service because it's the only service at the moment that serves Gemma 4 31b with a decent token rate (throughput).
However, in many requests, it responds with 429: "The model is currently overloaded. Please try again later.". This defeats the purpose of switching to your service, because even if I implement retry logic, the overall response will take longer and it would be the same as using a slower and cheaper service.
I was wondering if there's an ETA for fixing this and improving the reliability of Gemma 4 31b?
Thanks,
Max
Please authenticate to join the conversation.
New Submission
Bugs
9 days ago

Max Loh
Get notified by email when there are changes.
New Submission
Bugs
9 days ago

Max Loh
Get notified by email when there are changes.