Fast Anomaly Detection And Segmentation With FRE

This work presents a fast and principled approach to address the challenge of visual anomaly detection and segmentation. Our method operates under the assumption of having access solely to anomaly-free training data while aiming to identify anomalies of an arbitrary nature on test data. We present a generalized approach that utilizes a shallow linear autoencoder to perform out-of-distribution detection on the intermediate features generated by a pre-trained deep neural network. More specifically, we compute the feature reconstruction error (FRE) and establish it as principled measure of uncertainty. We rigorously connect our technique to the theory of linear auto-associative networks to provide a solid theoretical foundation and to offer multiple practical implementation strategies.

Furthermore, extending the FRE concept to convolutional layers, we derive FRE maps that provide precise pixel-level spatial localization of the anomalies within images, effectively achieving segmentation. Extensive experimentation demonstrates that our method outperforms the current state of the art. It excels in terms of speed, robustness, and remarkable insensitivity to parameterization. We make our code available at: https://intellabs.github.io/dfm

For more details, please see the paper [BMVC 2023].

Out-of-Distribution and Continual Novelty Detection

This work presents a fast, principled approach for detecting anomalous and out-of-distribution (OOD) samples in deep neural networks. We propose the application of linear statistical dimensionality reduction techniques on the semantic features produced by a DNN, in order to capture the low-dimensional subspace truly spanned by said features. We show that the feature reconstruction error (FRE), which is the L2-norm of the difference between the original feature in the high-dimensional space and the pre-image of its low-dimensional reduced embedding, is highly effective for OOD and anomaly detection. To generalize to intermediate features produced at any given layer, we extend the methodology by applying nonlinear kernel-based methods. Experiments using standard image datasets and DNN architectures demonstrate that our method meets or exceeds best-in-class quality performance, but at a fraction of the computational and memory cost required by the state of the art. It can be trained and run very efficiently, even on a traditional CPU.

Furthermore, we build upon the previously discussed technique to tackle novelty detection for continual learning. We propose incDFM (incremental Deep Feature Modeling), a self-supervised continual novelty detector, and evaluate its performance in the challenging class-incremental classification setting.

For more details on this research, please see the papers [ICIP_2022, ECCV 2022].

Uncertainty Estimation and Out-of-Distribution Detection

This work presents an approach for estimating uncertainty in deep neural networks. Our approach consists in modeling the reduced (training) features with parametric probability distributions before calculating the likelihoods of reduced test features w.r.t the learnt distributions. This uncertainty estimate can be used used to discriminate in-distribution samples from OOD samples. The benefits of our approach are demonstrated on OOD detection, and detection of malicious samples from adversarial and data poisoning attacks. Additionally, "flagged" samples can be used for active learning and AI bias mitigation.

For more details on this research, please see the papers [arxiv, NeurIPS BDL 2019, NeurIPS BDL 2019].

Multimodal Activity Recognition for Smart Manufacturing

This work sketches initial progress in data analysis for smart manufacturing, a topic that has been receiving considerable attention recently, given the explosion of data collection in factories and other industrial setups. By instrumenting workplaces with multimodal sensing (vision, audio, RFID, IMU, etc...) and following task flows, we collect and analyze in real time critical data on personnel and machines in order to provide deep understanding and actionable insights into manufacturing operations. We generate advanced manufacturing analytics that can be used to develop performance support tools to eliminate or significantly mitigate human variability and its adverse effects on high-volume manufacturing. Our system can help identify bottlenecks and support kaizen, track key performance indicators, detect anomalies, trigger preventive maintenance, develop training material, automate documentation, and provide remote access and troubleshooting tools.

We developed a complete proof-of-concept for two manufacturing operations inside an Intel semiconductor packaging factory. We use the Faster RCNN framework to perform object detection and C3D to extract spatio-temporal features from RGB-D video data. We subsequently perform fine-grain activity recognition with object-based and end-to-end recognition pipelines. From these, advanced analytics were derived and made available to key stakeholders (e.g. time of completion for a given action, compliance to spec, aggregation of statistics for a shift/factory to build distributions etc.). We also show the feasibility of depth-only data analysis, which is critical for expanding the concept of “smart spaces” to settings that are designed to be non-intrusive (e.g. to preserve trade secrets) or that require privacy protection (e.g. HIPAA regulations).

Adaptive Optimization of Compression and Video Analysis

It is estimated that more than 75% of current bandwidth consumption resides in images and videos, fueled by the exponential growth in visual data creation (e.g. 500 hours of video are uploaded to YouTube every minute). Another significant trend is that visual data will be increasingly consumed by machines rather than humans, with computer vision algorithms generating advanced analytics in lieu of tedious visual inspection (e.g. video surveillance).

This work considers the joint optimization of video encoding and analysis for machine vision usages. Such usages require different handling than the ones designed for human consumption, where the focus has traditionally been on subjective image/video quality. First, we develop (front-end) analytics-aware compression schemes for the (back-end) target computer vision application. With picture quality requirements no longer being driven by human perceptual quality, this creates an opportunity for novel and more efficient video coding schemes when machines/algorithms are the end consumer. We also design compression-aware analytics schemes that take advantage of the internal information included in encoded video data (e.g. motion vectors) to facilitate the back-end computer vision task. Since encoders look to identify correlation and saliencies in time and space, the utilization of this rich information can be useful for analytics to avoid redundant calculations on the back end.

We demonstrate benefits of our approach on a pedestrian detection use case with the Duke MTMC dataset. In one instance of analytics-aware compression, we use spatial saliency guided by visual analytics to improve compression efficiency with region-of-interest compression (i.e. dynamic bit allocation with high-attention regions coded with higher quality). The region of interest can be constructed from context, bitstream output, and feedback from the back-end application (i.e. past detections). We show better compression efficiency (2x to 3x bitrate reduction) without degradation in quality for the visual analytic.

Low Power Pose/Gesture Recognition

In this project, we designed and implemented ultra-low power, always-on hand pose recognition algorithms on Intel platforms, with inputs coming from a mobile (standard) monocular camera. We implemented an original algorithm for static hand pose recognition, optimized and transformed it into a hardware-friendly fixed-point model, generated a set of reference test input/output at intermediate stages of the computing pipeline, and ported the functional model to a heterogeneous platform (neural engine accelerator, DSP, ISP, etc.).

A Directional Interpolator For Image/Video Upscaling

This work considers the problem of real-time, high-quality image interpolation, especially for large scaling factors (e.g. greater than 3x). Such use cases have become more prevalent given the continued increase in resolution and screen size observed in display technology (e.g. 1080p, cinema-wide, 4K). The number of instances where the presentation of low resolution video on a high resolution display requires interpolation by at least a factor of 3x has increased, and has become more demanding in terms of visual quality (i.e. 56" displays and larger allow shorter viewing distances).

Here, we focus on preserving edges to improve the quality of interpolated images for real-time applications. Our main contribution is the definition of an efficient, pixel-adaptive directional interpolator with high quality performance and low computational complexity. Introducing nonlinearity in the scaling model (through directional interpolation) eliminates or strongly mitigates visual jaggy artifacts typically observed with linear scaling methods. While the idea of edge-directed interpolation has been explored before (e.g. NEDI), the originality in our approach lies in the principled design of a scalable, hardware-friendly solution driven by quality, performance, and practical implementability considerations. The adaptive scaler is applied to a large test set and its quality and run-time performance compared against competitive image upscaling techniques. Sample upscaled images available here.

Locally Optimal Filters for Dynamic Curve Estimation

This work considers the task of closed curve filtering. Estimation theory is applied to solve the problem of tracking deformable moving objects in an image sequence. Segmentation-based visual tracking strategies provide the closed curve measurements to filter. We discuss the derivation of a local, linear description for planar curve variation and curve uncertainty. It consists of a family of non-intersecting trajectories transverse to a given curve. Along one of the single-dimensional transverse trajectories, linear curve operations are feasible. Using the linear operation, simple locally optimal filtering procedures are derived. In particular, it is shown that an optimal first-order filtering strategy can be rigorously obtained. Extending further the work, we derive sub-optimal second-order curve filtering strategies. The second-order models account naturally for the curve velocities, which results in better curve estimates when dealing with highly elastic objects. In contrast to the first-order model, the second-order curve dynamics are nonlinear, which motivates a linear discrete approximation prior to deriving an extended Kalman filtering approach.

Once the curve filtering equations are derived, they are placed within the greater context of observer design for the estimation of a curve's position and deformations as it evolves in the plane. Application to online visual tracking is emphasized through experimentation with recorded imagery and objective comparison to other tracking methods.

For more details on the research, see [CDC 2009, CDC 2010] papers.

A Probabilistic Contour Observer for Online Visual Tracking

The work here considers the unconstrained segmentation and tracking problem. Online contour-based tracking is considered through the estimation perspective. We propose a recursive dynamic filtering solution to the tracking problem. The overall object motion is decomposed into a pose sub-state which represents the ensemble movement and a shape sub-state which represents the local deformations. The shape component of the filter is described implicitly by a probability field with prediction and correction mechanisms expressed accordingly. In particular, we present a second-order model that incorporates dynamics for capturing rapid or large deformations in shape. The filtering procedure decouples the pose and shape estimation. Experiments conducted with objective measures of quality demonstrate improved tracking.

Principal contributions include: (1) the formulation of the tracking problem as an observer design problem on the group and shape; (2) the incorporation of a dynamical model for the probabilistic shape space; (3) the definition of a novel correction method suited to the probabilistic shape space description; and (4) the quantitative validation of the system's performance.

For more details on the research, see [SIAM, ACC, ICIP] papers and videos.

Optimal Estimation Applied to Visual Contour Tracking

This work derives an optimal estimator for the purpose of online visual contour tracking. Starting from Bayesian segmentation as the measurement strategy, we use a bottom-up approach to design the estimator. We examine how the hypothesis of additive imaging noise affects the classification probabilities, infer the proper update law to be applied under such hypothesis, and derive the resulting optimal filtering scheme. In particular, it is shown that additive imaging noise leads to multiplicative segmentation uncertainty from which a geometric averaging update model is established. Given known noise statistics, the optimal correction gain and associated filtering equations are derived.

Benefits of this approach include the simplification of a filtering problem on the infinite-dimensional space of closed curves into a series of point-wise filtering tasks. The optimal gain derivation is formally tied to quantitative uncertainty levels of the image data and, therefore, does not require manual gain tuning. The optimal estimator is applied to noise-corrupted imagery and its performance compared against a fixed-gain filtering strategy and other visual tracking techniques.

For more details on the research, see [ACC] paper.

Multivariate Analysis of Imaging Mass Spectrometry Data

Imaging mass spectrometry can be used to reveal spatial distributions of multiple molecular species in a 2D biological sample. Because of the large amount of data produced by this technology, it is difficult and time-consuming to manually extract meaningful results from imaging mass spectrometry experimentation. We have developed and implemented an original approach to easily and consistently process mass spectrometry imaging data with the goal of automatically identifying interesting regions of molecule expression. Based on multivariate analysis techniques such as principal component analysis, the system allows researchers to conveniently define and visualize spatial regions based on spectral similarity. Features of our system are demonstrated on mouse cerebellum data.

This tool generates clear mass footprints and high-quality images of biological samples that match the histological results.

For more details on the research, see [BIBE] paper.

Noise Estimation and Adaptive Filtering for Visual Tracking

This work proposes a procedure to characterize segmentation-based visual tracking performance with respect to imaging noise. It identifies how imaging noise affects the target segmentation as measured through local shape metrics (Sobolev and Laplace error metrics). Such a procedure would be an important calibration step prior to implementing a visual tracking filter for a given need. We utilize the Bhattacharyya coefficient between the target and background intensity distributions to estimate the segmentation error. An empirical study is conducted to establish a correspondence between the Bhattacharyya coefficient and the segmentation error. The correspondence is used to adaptively filter temporally correlated segmentations.

The principal contributions of the work include: a methodology for utilizing a proven contrast parameter to derive expected segmentation errors that are geometrically relevant, an empirical procedure for identifying the optimal filter gain given the measured contrast, and the use of the optimal gain for shape filtering.

For more details on the research, see [ICIP] paper.

Concrete Column Detection for Construction Applications

The detection of structural elements from images/videos is useful for facilitating many building construction and maintenance applications including project progress monitoring, construction productivity analysis, and (infrastructure) automated visual inspection. The research in this area is under initial investigation. Following up on some previous work by the authors, we present an improved concrete column detection method. Compared against the initial method, the new technique reduces significantly the sensitivity to parameter selection, mainly in three aspects. First, edges are located using color information. Secondly, the orientation information of edge points is considered when constructing column boundaries. Thirdly, an artificial neural network for concrete material classification is developed to replace concrete sample matching. The method is tested online using live videos, and results are compared with the previous method to demonstrate the improvements.

For more details on the research, see [ISARC] paper.

Automated Panoramic Reconstruction

Due to non-disclosure agreements, only the reconstructed panoramic scenes are displayed here. They are obtained by stitching multiple (camera-rotating) closed-view pictures of the scene.