The basis of our understanding of visual perception dates back over 60 years to an accidental discovery made by Torsten Wiesel and David Hubel. The two doctors were attempting to stimulate a single cell in a cat’s brain with different visual stimuli – small dots projected onto a screen – and hadn’t produced a significant response until somebody accidentally knocked the edge of a slide into view! The resulting neuron activation sparked the discovery that individual cells in the visual cortex respond to lines and contours in specific orientations, not small points of light as the two had originally thought. From these responses to simple lines and contours, the brain is then able to recognize more complex shapes and features, and process the visual world in all its complexity.

Those same fundamentals are applied in computer vision! Of course, computers don’t have neurons corresponding to different lines and other visual properties, so instead they use various techniques to isolate low level features before processing and interpreting higher level features and the image itself. These techniques can be anything from simply searching for a certain color, to applying a LoG filter for edge detection, or even using kernels, shown below, to identify specific shapes and patterns.

Kernels identifying specific shapes

The image above, from NYU, shows feature visualizations (left) and the corresponding images which contain the highest degree of similarity using a trained neural network.

All of these low level features are combined and weighted to form higher level features and ultimately localize and identify objects in an image. Though the process of selecting which features best predict an object’s class is baked into some techniques like deep learning, selecting these features is often an explicit task for the ML scientists, known as feature engineering.

What is Feature Engineering?

A feature, sometimes referred to as a descriptor, is any attribute or property of an observed event. For example, in computer vision, color, edges and texture are some of the features that can describe an object of interest in a scene. Machine learning algorithms essentially learn these properties, which are passed as inputs during training, in order to predict or classify. The art of understanding which underlying features in the raw data best describe the event, extracting them, and transforming them into the required input format is known as feature engineering.

The art of understanding which underlying features in the raw data best describe the event, extracting them, and transforming them into the required input format is known as feature engineering.

Explicit feature engineering in traditional machine learning requires domain knowledge and a good understanding of the problem statement. ML scientists manually extract and analyze a wide range of features in a series of tedious, iterative, and error-prone tasks. Feature engineering entails brainstorming possible features, down selecting features using a combination of domain knowledge and dimensionality reduction techniques, testing model performance using selected feature vectors, and iterating over these steps until the desired model accuracy is achieved.

Since ML algorithms learn about desired events as a function of input features, feature engineering has a direct correlation to the performance of the ML model. However, some machine learning techniques avoid explicit feature engineering altogether: the recent rise in popularity of deep learning algorithms is partially due to that convenience. Deep learning algorithms essentially extract descriptive features through convolutions in the hidden layers without requiring any manual intervention at all.

What is Good Feature Engineering?

A ML model is only as good as the features it learns. A good feature is descriptive, separable and repeatable. Let’s consider the task of detecting stop signs as an example. If your only feature describing the stop sign is its shape, you will quickly run into a problem – lots of road signs have a hexagonal shape. What if we take color as a feature instead? Again, lots of other objects are red. As you can see, good feature engineering generally involves extracting multiple high-quality features that together are capable of distinguishing the object of interest from everything else.

A good feature is descriptive, separable and repeatable.

Why is Feature Engineering Important?

ML Model Accuracy: Good features lead to easier model training and faster convergence, less computation cost and greater generalizability, ultimately leading to higher accuracy.

ML Model Flexibility: ML models generally have quite a few tunable parameters, including parameters in the learning functions and in the model itself. Having good features takes the stress off of parameter training and allows for higher performance even when you have a suboptimal ML model architecture.

ML Model Complexity: Well engineered features drastically reduce the need for complex ML models. With fewer highly distinguishable features, ML scientists are able to use simpler architectures and reap the rewards in decreased processing, debugging and deployment costs.

ML Model Redundancy: Good features extract the key information required for the task from raw data. Though multiple features are generally cascaded for training a robust ML model, we can achieve similar results using a handful of accurate descriptors. This significantly reduces information redundancy.

Feature Engineering in Computer Vision

Feature engineering in computer vision is an extensively researched topic. High performing image features are generally salient points on the image and ideally invariant to noise, varying illumination, and image transformations like rotation, translation, and scaling. Different CV applications like object detection or image matching require different image features that adequately describe the structure and distinguishable properties of the objects of interest in an image. While there are numerous descriptive features that have found a permanent place in every ML scientist’s computer vision toolbox, we will focus our list on a few of the most commonly used image features across a wide range of applications: color, texture, interesting point, shape, frequency domain, statistical, and motion descriptors.

High performing image features are generally salient points on the image and ideally invariant to noise, varying illumination, and image transformations like rotation, translation, and scaling.

Color Descriptors: Color is one of the most important and basic features used to describe objects in an image. Color descriptors are invariant to image scaling, translation and rotation. The most commonly used color descriptors include histograms, color coherence vectors (CCV) and color moment. Color features can be extracted globally (from the whole image) or locally (pattern or distinct structure from local image patches).

Texture Descriptors: Texture features are generally used to provide information about the spatial arrangement of pixel intensity levels in an image. Texture is an important feature for applications like image stitching and image retrieval. Furthermore, texture features can provide information about image properties like smoothness, coarseness, and regularity. There are many structural, statistical, and spectral methods to extract texture features from an image including the most widely used gray level co-occurrence matrix (GLCM).

Interest Point Descriptors: Interest points in an image are anchor or key points that describe a sudden change in an image region. Some of the widely used interest point descriptors include Haar-like features, scale invariant feature transform (SIFT), speeded up robust features (SURF), and maximally stable extremal regions (MSER). Interest point features identify dominant visual cues like corner points, blobs, peaks, ridges and extreme intensity changes. Interest point descriptors are generally invariant to scale, rotation and illumination changes.

SURF interest point features

Example of SURF interest point features

Shape Descriptors: Shape descriptors are less developed than color or texture features because of the inherent complexity in representing shapes, especially for natural and deformable objects. Shape descriptors like edges and histograms of oriented gradients (HOG) are, however, essential to applications like image matching and template matching. Shape features can be loosely grouped as region-based and contour-based attributes. High performing shape features are invariant to affine, translation, noise and rotation transformations.

Frequency Domain Descriptors: Another set of popular features is the set of frequency domain descriptors. To extract these features we transform the image (spatial domain) to the frequency domain through operators like the Fourier transform and the Laplace transform. In the spatial domain, we extract features from the image as it is. Here, the values of the pixels change with the changes in the scene. However, in the frequency domain, features represent the rate of change in pixel values in the spatial domain. These changes are characteristic of changes in the geometry or spatial distribution in the image.

Statistical Descriptors: Statistical properties like mean, median and standard deviation seem simple, but are powerful features describing the objects of interest in an image. However, while using statistical descriptors, we should keep in mind that these features are highly susceptible to slight changes or noise in the scene. Statistical descriptors are generally used to supplement more robust features.

Motion Descriptors: In a range of real-world applications like autonomous driving or assisted surgery, motion plays a very important role in identifying anomalies or changes between scenes. Motion descriptors are also used to model the background (static image regions) by measuring temporal changes, tracking objects of interest between scenes, and at times modeling the depth perception of the scene.

Feature engineering is a test of how well machine learning scientists can integrate their understanding of a particular domain with their understanding of ML techniques. There is an almost endless amount of features that can be extracted from or created using a single dataset; even within computer vision, there are countless possible features to use, and many more ways to categorize them than presented above – features can be grouped into global and local, statistical and semantic, low-level and high-level, and more. Regardless of how the features are defined and organized, the most important part of this stage is that the ML scientist has a firm grasp on the features, and how each relates to the solution within the context of both the model architecture and the domain itself. Once a thorough analysis confirms that the right features have been selected, ML scientists can move on to the next steps in the machine learning process – training, validation, and deployment.

Free EBook - resolve 5 common ML Data Cleaning Problems