Please add the things listed.
Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use.
Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of capturing event by pinpointing the relevant video segments.
Please authenticate to join the conversation.
Backlog
Feature Requests
New Model
About 1 year ago

An Anonymous User
Get notified by email when there are changes.
Backlog
Feature Requests
New Model
About 1 year ago

An Anonymous User
Get notified by email when there are changes.