Researchers affiliated with Uber AI and OpenAI have proposed a new approach to neural architecture search (NAS), a technique that involves evaluating hundreds or thousands of AI models to identify the top performers. In a preprint paper, they claim their technique, called Synthetic Petri Dish, accelerates the most computationally intensive NAS steps while predicting model performance with higher accuracy than previous methods.
NAS teases out top model architectures for tasks by testing candidate models’ overall performance, dispensing with manual fine-tuning. But it requires large amounts of computation and data, the implication being that the best architectures train near the bounds of available resources. Synthetic Petri Dish takes an idea from biology to address this dilemma: It uses candidate architectures to create small models and evaluate them with generated data samples, such that this relative performance stands in for the overall performance.
“The overall motivation behind ‘in vitro’ (test-tube) experiments in biology is to investigate in a simpler and controlled environment the key factors that explain a phenomenon of interest in a messier and more complex system,” the researchers explained. “This paper explores whether the computational efficiency of NAS can be improved by creating a new kind of surrogate, one that can benefit from miniaturized training and still generalize beyond the observed distribution of ground-truth evaluations … [W]e can use machine learning to learn data such that training an [architecture] on the learned data results in performance indicative of the [architecture’s] ground-truth performance.”
Synthetic Petri Dish needs only a few performance evaluations of architectures and, once trained, enables “extremely rapid” testing of new architectures. The initial evaluations are used to train a Petri dish model while generating a set of architectures through an off-the-shelf NAS method. The trained Petri dish model then predicts the relative performance of the new architectures and selects a subset of architectures for performance evaluation.
The process repeats until the NAS method identifies the best architecture.
In experiments run on a PC with 20 Nvidia 1080 Ti graphics cards (for ground-truth training and evaluation) and a MacBook (for inference), the researchers sought to determine how Synthetic Petri Dish performs on the Penn Tree Bank (PTB) data set, a popular language modeling and NAS benchmark. Beginning from a ground-truth model containing 27 million parameters (variables), Synthetic Petri Dish generated 100 new architectures and evaluated the top 20 architectures.
The researchers say that at the end of the search, their technique found a model “competitive” in its performance with one found through conventional NAS while reducing the complexity of the seed model from 27 million parameters (variables) to 140 parameters. They also report that Synthetic Petri Dish required only a tenth of the original NAS’ compute and exceeded the performance of the original NAS when both were given equivalent compute.
“By approaching architecture search in this way as a kind of question-answering problem on how certain motifs or factors impact final results, we gain the intriguing advantage that the prediction model is no longer a black box. Instead, it actually contains within it a critical piece of the larger world that it seeks to predict,” the coauthors wrote. “[B]ecause the tiny model contains a piece of the real network (and hence enables testing various hypothesis as to its capabilities), the predictions are built on highly relevant priors that lend more accuracy to their results than blank-slate black box mode.”