Advertisement
DeepSeek
TechTech Trends

DeepSeek adds AI vision in major move: ‘the whale can now see’

On DeepSeek’s chat interface, a new ‘image recognition mode’ has been added alongside the ‘expert’ and ‘flash’ chat modes

2-MIN READ2-MIN
2
Listen
The DeepSeek logo is displayed alongside its AI assistant app on a mobile phone. File photo: Reuters
Vincent Chow
Chinese artificial intelligence start-up DeepSeek has added multimodal capabilities to its flagship chatbot for the first time – meaning that it can process images and video in addition to text – bringing it in line with rivals that already offer the function.

The limited release to select users comes just days after the Hangzhou-based company released its new flagship model V4, which was followed by extensive price cuts.

According to DeepSeek multimodal team leader Chen Xiaokang, who made the announcement on Wednesday on social media, the function was initially offered to select users on DeepSeek’s chatbot website and mobile application for beta testing.

Advertisement
“Come try out the incredible work from our genius multimodal colleagues!” senior researcher Chen Deli wrote on social media shortly after, adding that “the little whale can now see”, a reference to DeepSeek’s whale logo.

On DeepSeek’s chat interface, a new “image recognition mode” had been added alongside the “expert” and “flash” chat modes, which were introduced earlier this month.

Advertisement

As AI continues to rapidly progress, multimodal capabilities are viewed as a necessity to move beyond simple text conversations with users into more complex and economically valuable domains.

While DeepSeek’s breakout moment in January 2025 made it a household name internationally due to its model’s powerful reasoning capabilities and cost-efficiency, the start-up’s lack of a multimodal offering since then has been seen as an Achilles' heel.

Advertisement
Select Voice
Select Speed
1.00x