Members of the Google Brain team and Google AI this week open-sourced EfficientDet, an AI tool that achieves state-of-the-art object detection while using less compute. Creators of the system say it also achieves faster performance when used with CPUs or GPUs than other popular objection detection models like YOLO or AmoebaNet.
When tasked with semantic segmentation, another task related to object detection, EfficientDet also achieves exceptional performance. Semantic segmentation experiments were conducted with the PASCAL visual object challenge data set.
EfficientDet is the next-generation version of EfficientNet, a family of advanced object detection models made available last year for Coral boards. Google engineers Mingxing Tan, Google Ruoming Pang, and Quoc Le detailed EfficientDet in a paper first published last fall, but revised and updated it on Sunday to include code.
“Aiming at optimizing both accuracy and efficiency, we would like to develop a family of models that can meet a wide spectrum of resource constraints,” the paper, which examines neural network architecture design for object detection, reads.
Authors say existing methods of scaling object detection often sacrifice accuracy or can be resource intensive. EfficientDet achieves its less expensive and resource-hungry way to deploy object detection on the edge or in the cloud with a method that “uniformly scales the resolution, depth, and width for all backbone, feature network, and box/class prediction networks at the same time.”
“The large model sizes and expensive computation costs deter their deployment in many real-world applications such as robotics and self-driving cars where model size and latency are highly constrained,” the paper reads. “Given these real-world resource constraints, model efficiency becomes increasingly important for object detection.”
Optimizations for EfficientDet takes inspiration from Tan and Le’s original work on EfficientNet. and proposes joint compound scaling for backbone and feature networks. In EfficientDet, a bidirectional feature pyramid network (BiFPN) acts as a feature network, and an ImageNet pretrained EfficientNet acts as the backbone network.
EfficientDet optimizes for cross-scale connections in part by removing nodes that only have one input edge to create a simpler bidirectional network. It also relies on the one-stage detector paradigm, an object detector known for efficiency and simplicity.
“We propose to add an additional weight for each input during feature fusion, and let the network to learn the importance of each input feature,” the paper reads.
This is the latest object detection news from Google, whose Google Cloud Vision system for object detection recently removed male and female label options for its publicly available API.