- Sabina Pokhrel
Image Data Labelling and Annotation - Everything you need to know

Data labelling is an essential step in a supervised machine learning task. Garbage In Garbage Out is a phrase commonly used in machine learning community which means that the quality of the training data determines the quality of the model. The same is true for annotations used for data labelling. If you show a child a tomato and say its a potato, the next time the child sees a tomato, it is very likely that he classifies it as a potato. As a machine learning model learns in a similar way, by looking at examples, the result of the model depends on the labels we feed in during its training phase.
Data labelling is a task that requires a lot of manual work. If you can find a good open dataset for your project, that is labelled, LUCK IS ON YOUR SIDE! But mostly, this is not the case. It is very likely that you will have to go through the process of data annotation by yourself.
In this post, we will look at the types of annotation for images, commonly used annotation format and some tools that you can use for image data labelling.
Image Annotation Types
Before jumping into image annotations, it is useful to know about the different annotation types that exists so that you pick the right type for your use-case.
Here are a few different types of annotations:
Bounding boxes: Bounding boxes are the most commonly used type of annotation in computer vision. Bounding boxes are rectangular boxes used to define the location of the target object. They can be determined by the 𝑥 and 𝑦 axis coordinates in the upper-left corner and the 𝑥 and 𝑦 axis coordinates in the lower-right corner of the rectangle. Bounding boxes are generally used in object detection and localisation tasks.

Bounding boxes are usually represented by either two co-ordinates (x1, y1) and (x2, y2) or by one co-ordinate (x1, y1) and width (w) and height (h) of the bounding box. (See image below)

Polygonal Segmentation: Objects are not always rectangle in shape. With this idea, polygonal segmentation is another type of data annotation where complex polygons are used instead of rectangles to define the shape and location of the object in a much precise way.

Semantic Segmentation: Semantic segmentation is a pixel-wise annotation, where every pixel in the image is assigned to a class. These classes could be pedestrian, car, bus, road, sidewalk, etc., and each pixel carries semantic meaning.
Semantic Segmentation is primarily used in cases where environmental context is very important. For example, it is used in self-driving cars and robotics because for the models to understand the environment they are operating in.

3D cuboids: 3D cuboids are similar to bounding boxes with additional depth information about the object. Thus, with 3D cuboids you can get a 3D representation of the object, allowing systems to distinguish features like volume and position in a 3D space.
A use-case of 3D cuboids is in self-driving cars where it can use the depth information to measure the distance of objects from the car.

Key-Point and Landmark: Key-point and landmark annotation is used to detect small objects and shape variations by creating dots across the image. This type of annotation is useful for detecting facial features, facial expressions, emotions, human body parts and poses.

Lines and Splines: As the name suggests, this type is annotation is created by using lines and splines. It is commonly used in autonomous vehicles for lane detection and recognition.

Image Annotation Formats
There is no single standard format when it comes to image annotation. Below are few commonly used annotation formats:
COCO: COCO has five annotation types: for object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning. The annotations are stored using JSON.
For object detection, COCO follows the following format:
annotation{
"id" : int,
"image_id": int,
"category_id": int,
"segmentation": RLE or [polygon],
"area": float,
"bbox": [x,y,width,height],
"iscrowd": 0 or 1,
}
categories[{
"id": int,
"name": str,
"supercategory": str,
}]
Pascal VOC: Pascal VOC stores annotation in XML file. Below is an example of Pascal VOC annotation file for object detection.
<annotation>
<folder>Train</folder>
<filename>01.png</filename>
<path>/path/Train/01.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>224</width>
<height>224</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>36</name>
<pose>Frontal</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<occluded>0</occluded>
<bndbox>
<xmin>90</xmin>
<xmax>190</xmax>
<ymin>54</ymin>
<ymax>70</ymax>
</bndbox>
</object>
</annotation>
YOLO: In YOLO labelling format, a .txtfile with the same name is created for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file, that is object class, object coordinates, height and width.
<object-class><x><y><width><height>
For each object, a new line is created.
Below is an example of annotation in YOLO format where the image contains two different objects.
0 45 55 29 67
1 99 83 28 44
Image Annotation Tools
Here is a list of tools that you can use for annotating images:
1. MakeSense.AI
2. LabelImg
4. LabelMe
5.Scalable
6. RectLabel
Summary
In this post, we covered what data annotation/labelling is and why it is important for machine learning. We looked at 6 different types of annotations of images: bounding boxes, Polygonal Segmentation, Semantic Segmentation, 3D cuboids, Key-Point and Landmark, and Lines and Splines, and 3 different annotation formats: COCO, Pascal VOC and YOLO. We also listed a few image annotation tools that are available.
In the next post, we will cover how to annotate image data in detail. Stay tuned!
What image annotation type do you commonly use? Which format do you use for annotating your image? Leave your thoughts as comments below.
References:
https://lionbridge.ai/articles/image-annotation-tools-for-computer-vision/