Collecting Data for Custom Object Detection
Use of deep learning in computer vision has increased in the last decade. In the past couple of years, computer vision applications such as face detection and vehicle detection have become mainstream. One of the reasons is the availability of pre-trained models.
Convinced by the success of deep learning on these applications, businesses have now started to solve their own problems using deep learning.
But what if the available pre-trained models are not suitable for your application?
A pre-trained model may be able to detect eggs but it will definitely not differentiate between good and bad eggs because it has never been taught to do so.
So what do you do? Get lots of images of good and bad eggs and train a custom detection model.
A common challenge in creating a good custom computer vision model is training data. Deep learning models require huge amount of data to train its algorithm as we can see with benchmark models such as MaskRCNN, YOLO, and MobileNet, which were trained on existing large dataset COCOand ImageNet.
How do you get data for training a custom detection model?
In this post, we will look at 5 such ways of collecting data for training your custom model that solves your problem.
1. Publicly available open labelled datasets
If you are lucky, you might just get a labelled dataset you want online. Here is a list of free image datasets for computer vision that you can choose from.
ImageNet: ImageNet dataset consists of around 14 million images in total for 21,841 different categories of objects (data as of 12th Feb 2020). Some of the popular categories of objects in ImageNet are Animal (fish, bird, mammal, invertebrate), Plant (tree, flower, vegetable) and Activity (sport).
Common Objects in Context (COCO): COCO is a large-scale object detection, segmentation, and captioning dataset. It contains around 330,000 images out of which 200,000 are labelled for 80 different object categories.
Google’s Open Images: Open Images is a dataset of around 9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships. It contains a total of 16M bounding boxes for 600 object classes on 1.9M images, making it the largest existing dataset with object location annotations.
MNIST handwritten datasets: This dataset has a total of 70,000 images of handwritten digits and is a subset of a larger set available from NIST. The digits have been size-normalized and centred in a fixed-size image.
Cityscapes Dataset: This dataset focuses on semantic understanding of urban street scenes. It contains around 20,000 annotated images for 30 different classes.
NOTE: These are just a few that I found, and there are a lot of other datasets you can find online. Also, make sure you check the license of these datasets before using it.
2. Scraping the Web
Another option is to do an image search on the web, and hand-pick to download manually. As a large volume of data is required, this method is not efficient.
NOTE: The images on the web may be subject to copyright. Always remember to check the copyright of the images before using them.
An alternative to this is to write a program to scrape the web and download images you want. One such program is Download All Images, a Google Chrome extension that allows you to download a bunch of images at once. In this blog post, Arun Ponnusamyexpalins how you can use Download All Images to download images of people wearing a helmet.
NOTE: The copyright usage rights of images may not allow for using bulk-downloaded images. Always check the copyright of each image before using them.
3. Taking photographs
If you cannot find images of the object you want, you can collect them by clicking photographs. This can be done manually, that is by clicking each image by yourself or crowdsourcing, that is hiring other people to take photographs for you. Another way to collect real-world images is to install a programmed camera in your environment.
We know that deep learning models require a large amount of data. When you only have a small dataset, it may not be enough to train a good model. In such cases, you can use data augmentation to generate more training data.
Geometric transformations such as flipping, cropping, rotation and translation are some commonly used data augmentation techniques. Applying image data augmentation not only expands your dataset by creating variation, but also reduces overfitting.
5. Data Generation
Sometimes, real data may not be available. In such cases, synthetic data can be generated to training your custom detection model. Due to its low cost, use of synthetic data generation has been increasing in machine learning.
Generative Adversarial Networks (GANs) is one of many techniques which is used for synthetic data generation. GAN is a generative modelling technique, where artificial instances are created from a dataset in such a way that the similar characteristics of the original set are retained.
Collecting training dataset is the first step towards training your own custom detector model. In this post, we looked at some of the techniques used for collecting image data which includes searching through public open labelled datasets, scraping the web, taking photographs manually or using a program, using data augmentation techniques and generating synthetic datasets.
In the next post, we will look at the next step in training your custom detector, that is, labelling your dataset. So stay tuned.
What techniques do you use to collect image dataset? Leave your thoughts as comments below.
Looking for a pre-trained face detection model. Click here to download.
Check-out this post for more details on creating a robust object detection model.
Roh, Y., Heo, G., & Whang, S. (2019). A Survey on Data Collection for Machine Learning: A Big Data — AI Integration Perspective. IEEE Transactions On Knowledge And Data Engineering, 1–1. doi: 10.1109/tkde.2019.2946162