For years, AI models have been used to play games, typically excelling in one specific game and focusing on winning. However, Google Deepmind researchers have taken a different approach with their latest creation: an AI that can play multiple 3D games like a human and also understand and act on verbal instructions.
Existing computer characters or NPCs in games can perform similar actions but are controlled indirectly through in-game commands. Deepmind’s SIMA (scalable instructable multiworld agent) was trained by observing hours of human gameplay videos and annotations to learn to associate visual actions, objects, and interactions with verbal instructions.
By analyzing visual patterns on screen, such as character movements or interactions with objects, the AI can interpret actions like “moving forward” or “opening a door,” tasks that require more than simple key presses or object identification.
The AI was trained on various games like Valheim and Goat Simulator 3, with developers’ approval for the research. The goal was to test if an AI trained on one set of games could generalize its gameplay to new, unseen games, which it showed potential for, although some unique game mechanics could pose challenges for the AI.
Despite various in-game terms, many player actions boil down to a few fundamental verbs that impact the game world. The model recognizes several dozen primitives, such as “building a house,” indicating a foundational understanding of gameplay actions.
The researchers aim to create a more natural and responsive game companion compared to current rigid, pre-programmed NPCs. They envision collaborative gameplay with AI players like SIMA, which can adapt and exhibit emergent behaviors similar to human players.
Comparing this approach to traditional simulator-based agent training, which relies on reinforcement learning with predefined reward signals like wins or scores, the researchers emphasize their use of imitation learning from human behavior. This allows the AI to pursue a wide variety of tasks described in open-ended text, promoting flexible and diverse gameplay.
By valuing abstract concepts like similarity to observed successful actions rather than fixed rewards, the AI can be trained to attempt a wide range of tasks based on training data. This approach contrasts with the constrained decision-making of score-driven agents.
Similar efforts exploring open-ended collaboration, chatbot interactions, and improvised actions are emerging in AI research beyond game-playing scenarios, showcasing a broader application of these technologies.
While experiments like MarioGPT explore infinite games, the focus on AI companions like SIMA represents a step towards more human-like interactive experiences in gaming.