Alibaba Pours $290 Million Into AI Startup Building Real-World Models

Alibaba
The Alibaba Ecosystem Empowering Businesses Globally. [TechGolly]

Key Points:

  • Alibaba Cloud led a 2 billion yuan ($290 million) funding round for artificial intelligence startup ShengShu.
  • Technology companies are shifting focus from text-heavy language models to video-based world models.
  • ShengShu plans to use the money to build systems that connect digital spaces with physical robots.
  • Alibaba recently invested $50 million in Tripo AI and $60 million in PixVerse to back similar technology.

Alibaba Cloud is backing a new kind of artificial intelligence. The Chinese technology giant wants to move beyond popular text chatbots like ChatGPT. Instead, Alibaba is pouring money into what the industry calls world models. These new computer systems learn from videos and real-life physical scenarios rather than just reading vast amounts of internet text.

To lead this new trend, Alibaba just finalized a massive 2 billion yuan ($290 million) investment in a startup named ShengShu. TAL Education and Baidu Ventures also invested in this Series B funding round. This large cash injection arrives just two months after ShengShu secured 600 million yuan from Qiming Venture Partners and several other backers. ShengShu declined to share its exact company valuation.

ADVERTISEMENT
3rd party Ad. Not an offer or recommendation by dailyalo.com.

ShengShu created an artificial intelligence video generator called Vidu. The company released its Vidu Q3 Pro model earlier this January. The technology tracking group Artificial Analysis currently ranks Vidu among the top 10 models for turning basic text and images into full videos. ShengShu actually launched Vidu around the globe several months before OpenAI made its own famous video tool, Sora, widely available to the public. Other major Chinese technology companies, such as ByteDance and Kuaishou, also compete heavily in this fast-moving video-generation market.

ShengShu leaders plan to use the new $290 million to build a general world model. They want to create a computer system that bridges the digital world of video games and the physical world of robots and self-driving cars. Company founder Zhu Jun said his team aims to connect computer perception with physical action directly. This approach helps computer systems predict real-world behavior much better than standard text-based language models.

According to ShengShu, a general world model relies on multiple types of data to function properly. The computer system learns from vision, audio, and even physical touch sensors. The startup argues this method captures how the real world actually works. Language models simply guess the next word in a sentence based on written documents. World models, however, try to understand gravity, physical motion, and three-dimensional space.

Alibaba is spreading its bets across several companies in this specific field. Just last month, Alibaba and Baidu Ventures led a $50 million investment round for Tripo AI. This platform uses artificial intelligence to turn flat photographs into digital 3D models instantly. Tripo AI confirmed it is moving away from language model techniques to focus entirely on building its own world model grounded in physical space.

Before that, in September, Alibaba led a $60 million investment into PixVerse. That startup released an AI world model earlier this year that lets users actively direct how a video changes while the computer generates it. Alibaba also builds its own tools internally. The e-commerce giant recently released free, open-source models for generating videos. In February, the company launched a specific model designed to power robots.

ShengShu recently formed strategic partnerships with companies that build embodied artificial intelligence systems, such as humanoid robots. These robots operate in factories, commercial stores, and residential homes.

Industry experts say world models are absolutely critical for making these physical machines work safely. Kevin Kelly, a co-founder of the technology magazine Wired, noted last month that artificial intelligence needs three key elements to match human intelligence: reasoning, continuous learning, and a true understanding of the physical world. While current text chatbots handle basic reasoning, developers hope world models will finally teach computers to navigate physical reality successfully.

EDITORIAL TEAM
EDITORIAL TEAM
Al Mahmud Al Mamun leads the TechGolly editorial team. He served as Editor-in-Chief of a world-leading professional research Magazine. Rasel Hossain is supporting as Managing Editor. Our team is intercorporate with technologists, researchers, and technology writers. We have substantial expertise in Information Technology (IT), Artificial Intelligence (AI), and Embedded Technology.
Read More