Anomaly detection is an extremely challenging task in the field of computer vision. The reason it is difficult is that the system needs to learn known patterns and from those patterns, it has to infer the unknown patterns or information that is entirely different from the one that the system is trained on. The figure below shows an example of traffic anomaly detection (Mandal, et al., 2020).
Anomaly detection can be divided into two categories. One is the image-level anomaly detection, and the other is pixel-level anomaly detection. Image-level detection determines whether the whole image is normal or not but in pixel-level detection the model also needs to locate the position of the anomaly in the image. Different approaches are used in both types of anomaly detections. We will be discussing them one by one now.
Image Level Anomaly Detection:
The image-level anomaly detection can be further classified into 4 categories, which are
- Density estimation.
- One-class classification.
- Image reconstruction
- Self-supervised classification.
This method generates a probability distribution model of the normal images and sets it as a reference. The model then generates a new probability distribution model for new images and determines whether the new image is normal or has an anomaly present in the image.
For calculating a probability density model much training data is required and images have high feature data. This makes this problem quite complex. To overcome this problem people use deep generative models instead of conventional density methods such as the Gaussian model (Bishop, 2006) or nearest neighbor (Khan, et al., 2009). But deep generative models such as variational autoencoder (Kingma, et al., 2014) or flow model (Kingma, et al., 2018) are not robust enough to be tested on practical anomaly detection problems.
One Class Classification:
In this, a decision boundary is constructed to identify between the normal and abnormal images in the feature space. Classical methods include one-class support vector machines (Sch¨olkopf, et al., 2001). These methods do not require a large number of training data because they do not have to find a definite value for the probability distribution model. Recently, researchers have been working on combining both convolution neural networks with support vector machines to develop a hybrid model.
The benefit of using the image reconstruction approach is that it maps the image to the latent space which is the low dimensional vector in feature space. The intuition behind reconstruction is that the error in reconstruction is small when a normal is reconstructed and the error is large when there is an anomaly present in the image. Autoencoders have been comprehensively used for image reconstruction. Autoencoders have the smallest middle layer which compresses the features of the given image. (Japkowicz, et al., 1995) was one of the first people who used autoencoders for anomaly detection in images. The working principle of his model was that the extra information of a normal image is not necessarily extra in an abnormal image. Generative adversarial networks have also been used for anomaly detection using reconstruction. The adversarial network is first trained on normal images and then the difference between the normal and abnormal images is calculated to determine the presence of an anomaly. The only drawback is that the model needs to perform an iterative search process which makes the model inefficient for real-world applications.
The reason behind using self-supervised learning is that the models can learn important features and parameters from the given image on their own. They can extract low-level features and high-level features. (Golan, et al., 2018)in his work developed a neural network called RotateNet whose purpose was to find if the given image has been rotated or not. His model was able to learn both the low and high-level features because to distinguish between the normal and abnormal images the model had to learn the position, direction, and shape of the object in an image. The only limitation of RotateNet was when it had to deal with symmetrical objects.
Pixel level anomaly detection:
Pixel level anomaly detection can be further divided into two main categories. One is the image reconstruction, and the other is feature modeling.
The assumption behind using image reconstruction at the pixel level is that since the model is trained on the normal images, it cannot reconstruct abnormal images with accuracy hence we get a large reconstruction error when reconstructing abnormal images. The evaluation is done using the difference between the pixel of the reconstructed image and the normal image. L1 and L2 distance calculation methods are used to detect abnormalities at the pixel level. Baur combined both variational autoencoder and generative adversarial networks to find abnormal lesions present in the images taken by a magnetic resonance imaging machine (Baur, et al., 2018).
Source Baur, et al., 2018
In this approach, the anomalies are not detected in image space but rather in feature space. The development of the feature space of normal images can be achieved by two methods, one is the handcrafted extraction of features (Xianghua, et al., 2007), and the other is by using convolution neural networks (Napoletano, et al.). Once the features have been extracted then various machine learning algorithms can be applied to generate the feature distribution of the given images. Feature distribution of both the normal image and the test image are calculated and if the difference is more than the specified threshold then the test image is labeled as abnormal. For detecting the location of an anomaly in the image the test image is divided into multiple sub-images of smaller size and then the method of feature distribution is applied to smaller images. This method puts forward two problems. One is the demand computational power is more since the number of images has increased and the other is that the model may not detect anomalies in sub-images that may have been detected in a whole image.
Supervised vs unsupervised learning:
For anomaly detection, unsupervised learning is preferred since it provides more robustness to the system when dealing with real-world applications. For supervised learning images with anomalies are difficult to collect and are sometimes not enough to train a model because anomalies can be in different shapes, sizes, and colors.