Researchers propose using the game Overcooked to benchmark collaborative AI systems

January 16, 2021 Technology Comments Off 284 Views

Deep reinforcement learning systems are among the most capable in AI, particularly in the robotics domain. However, in the real world, these systems encounter a number of situations and behaviors to which they weren’t exposed during development.

In a step toward systems that can collaborate with humans in order to help them accomplish their goals, researchers at Microsoft, the University of California, Berkeley, and the University of Nottingham developed a methodology for applying a testing paradigm to human-AI collaboration that can be demonstrated in a simplified version of the game Overcooked. Players in Overcooked control a number of chefs in kitchens filled with obstacles and hazards to prepare meals to order under a time limit.

The team asserts that Overcooked, while not necessarily designed with robustness benchmarking in mind, can successfully test potential edge cases in states a system should be able to handle as well as the partners the system should be able to play with. For example, in Overcooked, systems must contend with scenarios like when a plates are accidentally left on counters and when a partner stays put for a while because they’re thinking or away from their keyboard.

Above: Screen captures from the researchers’ test environment.

The researchers investigated a number of techniques for improving system robustness, including training a system with a diverse population of other collaborative systems. Over the course of experiments in Overcooked, they observed whether several test systems could recognize when to get out of the way (like when a partner was carrying an ingredient) and when to pick up and deliver orders after a partner has been idling for a while.

According to the researchers, current deep reinforcement agents aren’t very robust — at least not as measured by Overcooked. None of the systems they tested scored above 65% in the video game, suggesting, the researchers say, that Overcooked can serve as a useful human-AI collaboration metric in the future.

“We emphasize that our primary finding is that our [Overcooked] test suite provides information that may not be available by simply considering validation reward, and our conclusions for specific techniques are more preliminary,” the researchers wrote in a paper describing their work. “A natural extension of our work is to expand the use of unit tests to other domains besides human-AI collaboration … An alternative direction for future work is to explore meta learning, in order to train the agent to adapt online to the specific human partner it is playing with. This could lead to significant gains, especially on agent robustness with memory.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform
networking features, and more

Become a member

Let’s block ads! (Why?)

VentureBeat

Web Wad

Researchers propose using the game Overcooked to benchmark collaborative AI systems

VentureBeat

About

Related Articles

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success

Tomato.ai launches zero-shot accent softening model to revolutionize call center industry

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

What Might Fasting Insulin Predict About Health?

8 Things I Always Buy at Thrift Stores

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

Atari’s Pong Quest turns the classic paddle game into an RPG

F.D.A. Expands Coronavirus Testing in the United States

How RapidCanvas automates 70% of data tasks for gen AI projects

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success