Introduction
Stefan Lee, assistant professor of computer science at 精东影视 State University, is making significant strides at the intersection of artificial intelligence and robotics. Having held visiting research positions at Indiana University, Virginia Tech, Georgia Tech, and Meta AI before joining 精东影视 State in 2019, Lee was drawn to the university鈥檚 leadership in robotics and AI, as well as the strong collaborative spirit of its faculty. His primary research interest lies in language grounding, which aims to associate words with their real-world meanings and representations.
Lee and his team leverage advancements in natural language processing for increasingly intelligent embodied systems. Surpassing the capabilities of language-generation applications like ChatGPT, Lee鈥檚 approach 鈥 combining natural language processing and computer vision in embodied contexts 鈥 opens up the potential for AI systems to interact more fluidly with humans in the physical world.
Internationally acclaimed research
Lee was recently honored at the International Conference on Learning Representations, alongside collaborators from Meta AI and Georgia Tech, with one of four Outstanding Paper Awards. Their paper, 鈥,鈥 was selected from among 4,900 submitted to the conference.
The research delves into how 鈥渂lind鈥 AI navigation agents, equipped solely with egomotion sensing, can learn to navigate unfamiliar environments and construct maplike representations that enable them to take shortcuts, follow walls, predict free space, and detect collisions.
鈥淢y focus is the development of agents that can perceive their environment and communicate about this understanding with humans in order to coordinate their actions to achieve mutual goals 鈥 in short, agents that can see, talk, and act,鈥 Lee said. 鈥淐onsequently, I work on problems in computer vision, natural language processing, and deep learning in general.鈥
The importance of language grounding
Lee is fundamentally interested in language grounding 鈥 associating words with sights, sounds, and actions, in order to anchor their meanings in day-to-day life and in communicable expressions.

The instructions and path of a robot using computer vision to navigate.
Grounding is crucial for robots with diverse embodiments, such as legs, wheels, or different types of manipulators. While Lee acknowledges that large language models play a significant role in his research, he points out that these models lack the ability to ground words and concepts in the real world.
鈥淐hatGPT can write you a poem about cats, and it can even identify one in a photo,鈥 Lee said. 鈥淗owever, it doesn鈥檛 know that cats are furry, in that it lacks tactical sensor to identify what 鈥榝urry鈥 even means or what its experiential implications are.鈥
As an example, Lee highlights the complexity of the challenges a robot faces when given the simple command to go to the kitchen and slice an apple.
鈥淚f it actually wants to follow that, it has to be able to ground references to 鈥榢itchen鈥 and 鈥榓pple鈥 to the stimuli it collects from onboard sensors, like cameras,鈥 Lee said. 鈥淭he robot also has to understand what 鈥榞o鈥 and 鈥榮lice鈥 mean, for the particular embodiments it has. We have hands, so slicing looks like a particular motion for us. For a robot with a different set of manipulators, slicing may require very different motions, even if the outcome we want is the same.鈥
Lee added that the ability to draw conclusions from perceptual data will continue to be a focus for AI researchers.
The future of AI and language grounding
The current surge of interest in AI has been driven by recent advancements in the field鈥檚 ability to deal with sound, text, and imagery. Consumer AI applications have become profitable, spurring further excitement for the technology. To what degree large language models like ChatGPT will end up augmenting or replacing creative or intellectual work remains an open question. Looking beyond ChatGPT, Lee sees significant opportunities in language grounding as a means to expand interactions with embodied agents.
鈥淥ne of the reasons I鈥檓 excited about language grounding is the issue of access,鈥 he said. 鈥淢ost of us are not programmers, and even fewer are mechanical engineers and roboticists. It would be great if you could talk to a robot to get it to perform actions, which would require the robot to be able to reason about grounding appropriately.鈥
As robots and embodied agents become increasingly integrated into our daily lives, our ability to communicate with and control them easily using natural language will be essential to ensure accessibility. This is particularly true for people with disabilities, who stand to benefit most from these technologies. By advancing research in language grounding, Lee and his colleagues are working to create a better future for human-AI cooperation.
If you鈥檙e interested in connecting with the AI and Robotics Program for hiring and collaborative projects, please contact AI-OSU@oregonstate.edu.