Google open-sources AI that searches tables to answer natural language questions

April 30, 2020 Technology Comments Off 363 Views

Google today open-sourced a machine learning model that can point to answers to natural language questions (for example, “Which wrestler had the most number of reigns?”) in spreadsheets and databases. The model’s creators claim it’s capable of finding even answers spread across cells or that might require aggregating multiple cells.

Much of the world’s information is stored in the form of tables, points out Google Research’s Thomas Müller in a blog post, like global financial statistics and sports results. But these tables often lack an intuitive way to sift through them — a problem Google’s AI model aims to fix.

To process questions like “Average time as champion for top 2 wrestlers?,” the model jointly encodes the question as well as the table content row by row. It leverages a Transformer-based BERT architecture — an architecture that’s both bidirectional (allowing it to access content from past and future directions) and unsupervised (meaning it can ingest data that’s neither classified or labeled) — extended along with numerical representations called embeddings to encode the table structure.

A key addition was the embeddings used to encode the structured input, according to Müller. Learned embeddings for the column index, the row index, and one special rank index indicate to the model the order of elements in numerical columns.

Above: A table and questions with the expected answers. Answers can be selected (#1, #4) or computed (#2, #3).

Image Credit: Google

VB Transform 2020 Online – Live July 15-17, 2020: Join leading AI executives at VentureBeat’s AI event of the year. Register today and save 30% off digital access passes.

For each table cell, the model generates a score indicating the probability that the cell will be part of the answer. In addition, it outputs an operation (e.g., “AVERAGE,” “SUM,” or “COUNT”) indicating which operation (if any) must be applied to produce the final answer.

To pre-train the model, the researchers extracted 6.2 million table-text pairs from English Wikipedia, which served as a training data set. During pre-training, with relatively high accuracy, the model learned to restore words in both tables and text that had been removed — 71.4% of items were restored correctly for tables unseen during training.

After pre-training, Müller and team fine-tuned the model via weak supervision, using limited sources to provide signals for labeling the training data. They report that the best model outperformed the state-of-the-art for the Sequential Answering Dataset, a Microsoft-created benchmark for exploring the task of answering questions on tables, by 12 points. It also bested the previous top model on Stanford’s WikiTableQuestions, which contains questions and tables sourced from Wikipedia.

“The weak supervision scenario is beneficial because it allows for non-experts to provide the data needed to train the model and takes less time than strong supervision,” said Müller.

Let’s block ads! (Why?)

VentureBeat

Web Wad

Google open-sources AI that searches tables to answer natural language questions

About

Related Articles

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success

Tomato.ai launches zero-shot accent softening model to revolutionize call center industry

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

What Might Fasting Insulin Predict About Health?

8 Things I Always Buy at Thrift Stores

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

Simple Moisturizers For Complicated Skin

Covid-19 Relief Bill Fulfills Biden’s Promise to Expand Obamacare, for Two Years

Government-funded R&D is too important to squander on unworthy startups

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success