For every new application of Computer Vision-based Machine Learning, a newly annotated training dataset is needed to train a model; as the rapid proliferation of computer vision solutions continues to spread across industries, the need for annotated data only grows. As many people know, annotating a dataset from scratch is a time-intensive process, but it’s absolutely necessary for a successful model. That’s where clever hacks like pre-annotation come in.

What is Pre-Annotation?

Pre-annotation refers to the process of using an existing model to generate annotations for a dataset before an annotator manually completes them. This sort of human-in-the-loop annotation allows us to automate as much annotation as possible, and results in huge time savings for annotators. Though the main use case for using pre-annotations is simply to increase annotation speed before model training, pre-annotation can also be used towards the end of training to quickly measure and improve a model’s accuracy on new or more challenging data.

Pre-Annotation for Speed

The first major use case for pre-annotations – and by far the most popular – is simply to speed up the annotation process to create training data from scratch. The accuracy of the pre-annotations is only limited by the model used to generate them, but by definition are incomplete for the intended application. For example, in the case of a dataset created to identify cars by brand, it’s easy to imagine using a pre-trained model which detects only cars to generate the pre-annotations; in this case, all the annotator has to do to complete the annotations is edit the model of each detected car in the class name or associated metadata. Pre-annotations can be as complex as a fully segmented image which simply needs a text string describing the scene, or as simple as roughly placed bounding boxes around potential objects in need of an adjustment and the proper class assignment. In each case the operating principle of letting a computer do as much of the legwork as possible remains the same.

Pre-Annotation for Accuracy

A second use case for pre-annotation is geared towards model accuracy, not just improving the speed of annotation. As models improve in their performance, a generic model is no longer useful for pre-annotation; instead, ML scientists can run their partially-trained models on new and more challenging data, inspect and edit the predictions as though they’re pre-annotations, and compare the initial predictions with the manually edited ground truth annotations. Simple calculations like the per-class precision and recall values will provide actionable insights into the model’s performance on the new dataset, and help ML scientists decide how to proceed, whether that means adding this new ground truth data to the training dataset, or accepting the current performance and deploying the model. In either case, using pre-annotation to improve, inspect, and rigorously confirm model performance can help ML scientists maintain visibility and control through the end of the development process.

Any tool that helps automate the annotation process and limit manual annotator work will pay dividends throughout the dataset preparation process. Pre-annotation, though simple to understand, is one of those tools whose clever applications has a huge potential to expedite and improve the annotation process. Platforms like Innotescus, which give users the control over how and when to import pre-annotations, help facilitate these efficiency gains so users can create, adjust, and analyze their annotations faster, and get to training and deployment with greater control over the process from end to end.