Google releases SimCLR, an AI framework that can classify images with limited labeled data

April 9, 2020 Technology Comments Off 288 Views

A team of Google researchers recently detailed a framework called SimCLR, which improves previous approaches to self-supervised learning, a family of techniques for converting an unsupervised learning problem (i.e., a problem in which AI models train on unlabeled data) into a supervised one by creating labels from unlabeled data sets. In a preprint paper and accompanying blog post, they say that SimCLR achieved a new record for image classification with a limited amount of annotated data and that it’s simple enough to be incorporated into existing supervised learning pipelines.

That could spell good news for enterprises applying computer vision to domains with limited labeled data.

SimCLR learns basic image representations on an unlabeled corpus and can be fine-tuned with a small set of labeled images for a classification task. The representations are learned through a method called contrastive learning, where the model simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images.

Above: An illustration of the SimCLR architecture.

Image Credit: Google

SimCLR first randomly draws examples from the original data set, transforming each sample twice by cropping, color-distorting, and blurring them to create two sets of corresponding views. It then computes the image representation using a machine learning model, after which it generates a projection of the image representation using a module that maximizes SimCLR’s ability to identify different transformations of the same image. Finally, following the pretraining stage, SimCLR’s output can be used as the representation of an image or tailored with labeled images to achieve good performance for specific tasks.

Google says that in experiments SimCLR achieved 85.8% top 5 accuracy on a test data set (ImageNet) when fine-tuned on only 1% of the labels, compared with the previous best approach’s 77.9%.

“[Our results show that] preretraining on large unlabeled image data sets has the potential to improve performance on computer vision tasks,” wrote research scientist Ting Chen and Google Research VP and engineering fellow and Turing Award winner Geoffrey Hinton in a blog post. “Despite its simplicity, SimCLR greatly advances the state of the art in self-supervised and semi-supervised learning.”

Both the code and pretrained models of SimCLR are available on GitHub.

Let’s block ads! (Why?)

VentureBeat

Web Wad

Google releases SimCLR, an AI framework that can classify images with limited labeled data

About

Related Articles

Check Also

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success

Tomato.ai launches zero-shot accent softening model to revolutionize call center industry

The scale of ambition in gaming is getting bigger | Brian Ward fireside chat

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

What Might Fasting Insulin Predict About Health?

8 Things I Always Buy at Thrift Stores

Could a Keto Diet Be Bad for Athletes’ Bones?

How to Invest in Real Estate to Achieve FIRE

Appeal Cosmetics New Products!

Can Antibody Tests Help End the Coronavirus Pandemic?

How Exercise Affects Our Minds: The Runner’s High

Championing Hawkers With S$1.50 Delivery Fees: Rishabh Singhvi, WhyQ Co-Founder

How RapidCanvas automates 70% of data tasks for gen AI projects

10 Tree Shapes to Transform Your Yard

Unifying gen X, Y, Z and boomers: The overlooked secret to AI success