With the insights gained from EDA, ML scientists are often faced with decisions about how to address the shortcomings of their datasets. Collecting and annotating more data is expensive, but a biased or incomplete dataset will only produce a similarly flawed trained model. That’s where data enrichment comes in. Data enrichment refers to a wide set of processes that enhance, augment, and refine data points, but in this post, we’ll focus solely on data augmentation.

Data augmentation is a cheap and simple way to expand and add variance to your dataset, and make your model capable of handling unobserved input. Augmenting your data includes applying simple transformations to your existing dataset; adding noise, translating the image, and varying the scale of each image all work to increase the size and variability of your training dataset. Though it doesn’t add as much variability as collecting new data, data augmentation is a cheap and fast way to reduce overfitting and improve the generalization of a trained model. Here, we’ll detail the common types, techniques, and tools used in data augmentation for computer vision based Machine Learning.

Types of Augmentation

Data augmentation can be broadly classified into two “types” depending on where in the ML pipeline it occurs. 

Offline Augmentation

Here, we perform data augmentation on the training dataset before we train the ML model. Offline augmentation, sometimes referred to as pre-processing augmentation, is easy to understand, visualize and control as the artificial data is created beforehand. However, this also significantly increases the storage needs. For example, if we simply rotate all the images by a predefined angle once, we have doubled the dataset size. Thus, offline augmentation is generally preferred for relatively smaller datasets.

Online Augmentation

Here, instead of storing the augmented data in the training dataset, the transformations are done on mini-batches that are then fed to the ML model during training. Online augmentation, sometimes referred to as real-time augmentation, does not require saving augmented data to the disk, negating the need for increased storage, making it ideal for larger training datasets. 

Common Techniques

Most popular augmentation techniques aim at applying small transformations to existing image data so the resulting images closely resemble naturally occurring variations. For example, if you are looking to augment car images, simply flipping the original image horizontally will mimic a naturally occurring variation of the original data. Since most ML models just learn to map specific input features to the desired output labels, any transformation to the input features is perceived as a new datapoint. Here, we’ll introduce a few commonly used image augmentation techniques:


Rotation augmentations are done by rotating the images around a point (typically the image’s center) by a specified angle. Rotation doesn’t always preserve the original image dimensions, and can introduce blank space, as seen below. There are a variety of methods to handle the latter, from reflecting the image to simply replicating edge pixel values to fill the space.


Flip augmentations are done by flipping the images either vertically or horizontally. Below, the original image (left) is flipped horizontally (center), and then flipped again vertically (right).


Noise augmentations add small amounts of gaussian or salt and pepper noise to the images. Adding noise distorts the high frequency features in an image, forcing the model to learn multiple high frequency features and generalize better.


Shear augmentations involve fixing one edge and displacing the rest of the image by a fixed angle. Shear creates a “stretch” in the image, unlike in rotation.

Color Space

Color space augmentations are done by isolating the color channels comprising an image and manipulating the pixel values. This is how we change well-known image characteristics like brightness, saturation, and contrast.


Crop augmentations generally select a part of the image at random, and resize it to the original image size. Whether the cropped image contains a part of the object of interest or just the background, the model will see it as a new data point teaching it to distinguish between relevant and irrelevant objects.


Translation augmentation shifts an image by a certain number of pixels (right, left, top or bottom). Translation augmentations are very useful to avoid positional biases.


Zoom augmentation simply scales an image up or down. Unlike random cropping, zoom augmentation is applied to a predefined area of the image which forces the ML model to refine its assumptions about what lies beyond the image boundary.

Generative Adversarial Networks

Augmentation using GANs is a fairly new and advanced area of data augmentation. Here, two networks (a generator and a discriminator) are trying to “fool” each other. Generator networks try to create “fake” images similar to the images in the training set, and the discriminator networks try to distinguish between fake and real data. Over a period of time, the generator networks learn to create images that are practically indistinguishable from real images.

Mixing Images

Mixing images is another advanced image augmentation technique; it involves blending parts of two images into one. For example, we could replace the background behind the object of interest in one image with that of another image. Many techniques including matting and neural style transfer have seen significant improvements to achieve this in the recent years.

Population Based Augmentation

PBA has recently been gaining popularity as a low cost alternative to GANs and neural style transfer techniques. PBA generates non-stationary augmentation policy schedules instead of a fixed augmentation policy. PBA treats finding the right augmentation policy as an optimization problem.

These are just a few of the most popular augmentation techniques, but the combinations of these and others are endless, and there’s no shortage of tools to aid in the process.

Image Augmentation in Python

Python has many libraries like Keras and OpenCV that support the data augmentation needs of ML scientists. Imgaug is the most widely used library for image augmentations; it provides a wide range of augmentation techniques for different annotation types including key points, landmarks, bounding boxes, and segmentation maps. Imgaug is an extremely powerful package that contains over 60 different image augmentors and augmentation techniques, including advanced transformations like affine transforms, perspective transforms, and hue or saturation changes. Imgaug also supports online augmentation, so it can directly augment data in mini-batches with just a few lines of code. Imgaug also has robust documentation with working examples and workarounds. Below is an example of an image augmented 10 times with crop, flip, rotation, translation, blur, and contrast enhancer augmentors applied at random using one line of code from the imgaug library.

Data augmentation is a powerful way to expand and improve your training dataset. It is a simple, low cost way to make your training data more robust and your model more effective in the field. Once you’ve augmented your dataset to remove irrelevant biases and account for a wider array of inputs, there’s only one more step before actually training your model – feature engineering.