Pattern Analysis and Machine Intelligence

Syndicate content IEEE Computer Society
The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) is published monthly. Its Editorial Board strives to publish papers that present important research results within PAMI's scope. These include statistical and structural pattern recognition; image analysis; computational models of vision; computer vision systems; enhancement, restoration, segmentation, feature extraction, shape and texture analysis; applications of pattern analysis in medicine, industry, government, and the arts and sciences; artificial intelligence, knowledge representation, logical and probabilistic inference, learning, speech recognition, character and text recognition, syntactic and semantic processing, understanding natural language, expert systems, and specialized architectures for such processing.
Updated: 1 year 6 weeks ago

PrePrint: Recognizing Gestures by Learning Local Motion Signatures of HOG Descriptors

Wed, 04/04/2012 - 16:35
We introduce a new gesture recognition framework based on learning local motion signatures (LMSs) of HOG descriptors introduced by [1]. Our main contribution is to propose a new probabilistic learning-classification scheme based on a reliable tracking of local features. After the generation of these LMSs computed on one individual by tracking Histograms of Oriented Gradient (HOG) [2] descriptor, we learn a code-book of video-words (\ie clusters of LMSs) using k-means algorithm on a learning gesture video database. Then the video-words are compacted to a code-book of code-words by the Maximization of Mutual Information (MMI) algorithm. At the final step, we compare the LMSs generated for a new gesture \wrt the learned code-book via the k-nearest neighbors (k-NN) algorithm and a novel voting strategy. Our main contribution is the handling of the N to N mapping between code-words and gesture labels within the proposed voting strategy. Experiments have been carried out on two public gesture databases: KTH [3] and IXMAS [4]. Results show that the proposed method outperforms recent state-of-the-art methods.

PrePrint: A Minimal Solution for the Extrinsic Calibration of a Camera and a Laser-Rangefinder

Wed, 04/04/2012 - 16:35
This article presents a new algorithm for the extrinsic calibration of a perspective camera and an invisible 2D laser-rangefinder (LRF). The calibration is achieved by freely moving a checkerboard pattern in order to obtain plane poses in camera coordinates and depth readings in the LRF reference frame. The problem of estimating the rigid displacement between the two sensors is formulated as the one of registering a set of planes and lines in the 3D space. It is proved for the first time that the alignment of 3 plane-line correspondences has at most 8 solutions, that can be determined by solving a standard p3p problem and a linear system of equations. This leads to a minimal closed-form solution for the extrinsic calibration that can be used as hypothesis generator in a RANSAC paradigm. Our calibration approach is validated through simulation and real experiments, that show the superiority with respect to the current state-of-the-art method requiring a minimum of 5 input planes.

PrePrint: Embedding Retrieval of Articulated Geometry Models

Wed, 04/04/2012 - 16:35
Due to the popularity of computer games and animation, research on 3D articulated geometry model retrieval is attracting a lot of attention in recent years. However, most existing works extract high dimensional features to represent models, which suffer from practical limitations. First, misalignment in high dimensional features may produce unreliable Euclidean distances and affect retrieval accuracy. Second, the curse of dimensionality degrades efficiency. We propose an embedding retrieval framework to improve the practicability of these methods. It is based on a manifold learning technique, the Diffusion Map (DM). We project all pairwise distances onto a low dimensional space. This improves retrieval accuracy because inter-cluster distances are exaggerated. Then we adapt the Density-Weighted Nystr\"{o}m extension and propose a novel step to locally align the Nystr\"{o}m embedding to the eigensolver embedding so as to reduce extension error and preserve retrieval accuracy. Finally, we propose a heuristic to handle disconnected manifolds by augmenting the kernel matrix with multiple similarity measures and shortcut edges, and further discuss the choice of DM parameters. We have incorporated two existing matching algorithms for testing. Our experimental results show improvement in precision at high recalls and in speed. Our work provides a robust retrieval framework for the matching of multimedia data that lie on manifolds.

PrePrint: Empirical Mode Decomposition Analysis for Visual Stylometry

Wed, 04/04/2012 - 16:35
In this paper we show how the tools of empirical mode decomposition (EMD) analysis can be applied to the problem of "visual stylometry," generally defined as the development of quantitative tools for the measurement and comparisons of individual style in the visual arts. In particular we introduce a new form of EMD analysis for images and show that it is possible to use its output as the basis for the construction of effective support vector machine-based stylometric classifiers. We present the methodology and then test it on collections of two sets of digital captures of drawings: a set of authentic and well known imitations of works attributed to the great Flemish artist Pieter Bruegel the Elder (1525--1569) and a set of works attributed to Dutch master Rembrandt van Rijn (1606--1669) and his pupils. Our positive results indicate that EMD-based methods may hold promise generally as a technique for visual stylometry.

PrePrint: A Blur-robust Descriptor with Applications to Face Recognition

Wed, 04/04/2012 - 16:35
Understanding the effect of blur is an important problem in unconstrained visual analysis. We address this problem in the context of image-based recognition, by a fusion of image-formation models, and differential geometric tools. First, we discuss the space spanned by blurred versions of an image and then under certain assumptions, provide a differential geometric analysis of that space. More specifically, we create a subspace resulting from convolution of an image with a complete set of orthonormal basis functions of a pre-specified maximum size (that can represent an arbitrary blur kernel within that size), and show that the corresponding subspaces created from a clean image and its blurred versions are equal under the ideal case of zero noise, and some assumptions on the properties of blur kernels. We then study the practical utility of this subspace representation for the problem of direct recognition of blurred faces, by viewing the subspaces as points on the Grassmann manifold and present methods to perform recognition for cases where the blur is both homogenous and spatially varying. We empirically analyze the effect of noise, as well as the presence of other facial variations between the gallery and probe images, and provide comparisons with existing approaches on standard datasets.

PrePrint: A Probabilistic Approach to Pattern Matching in the Continuous Domain

Wed, 04/04/2012 - 16:35
The goal of this paper is to solve the following basic problem: given discrete noisy samples from a continuous signal, compute the probability distribution of its distance from a fixed template. As opposed to the typical restoration problem, which considers a single optimal signal, the computation of the entire probability distribution necessitates integrating over the entire signal space. To achieve this, we apply path integration techniques. The problem is studied in one and two dimension, and an accurate solution as well as an efficient approximation scheme are provided.

PrePrint: Face Recognition using Sparse Approximated Nearest Points between Image Sets

Wed, 04/04/2012 - 16:35
We propose an efficient and robust solution for image set classification. A joint representation of an image set is proposed which includes the image samples of the set and their affine hull model. The model accounts for unseen appearances in the form of affine combinations of sample images. To calculate the between-set distance, we introduce the Sparse Approximated Nearest Point (SANP). SANPs are the nearest points of two image sets such that each point can be sparsely approximated by the image samples of its respective set. This novel sparse formulation enforces sparsity on the sample coefficients and jointly optimizes the nearest points as well as their sparse approximations. Unlike standard sparse coding, the data to be sparsely approximated is not fixed. A convex formulation is proposed to find the optimal SANPs between two sets and the accelerated proximal gradient method is adapted to efficiently solve this optimization. We also derive the kernel extension of the SANP and propose an algorithm for dynamically tuning the RBF kernel parameter while matching each pair of image sets. Comprehensive experiments on the UCSD/Honda, CMU MoBo and Youtube Celebrities face datasets show that our method consistently outperforms the state-of-the-art.

PrePrint: Learning Optimal Embedded Cascades

Wed, 04/04/2012 - 16:35
The problem of automatic and optimal design of embedded object detector cascades is considered. Two main challenges are identified: optimization of the cascade configuration, and optimization of individual cascade stages, so as to achieve the best trade-off between classification accuracy and speed, under a detection rate constraint. Two novel boosting algorithms are proposed to addressed these problems. The first, RCBoost, formulates boosting as a constrained optimization problem, which is solved with a barrier penalty method. The constraint is the target detection rate, which is met at all iterations of the boosting process. This enables the design of embedded cascades of known configuration without extensive cross-validation or heuristics. The second, ECBoost, searches over cascade configurations, to achieve the optimal trade-off between classification risk and speed. The two algorithms are combined into an overall boosting procedure, RCECBoost, which optimizes both the cascade configuration and its stages under a detection rate constraint, in a fully automated manner. Extensive experiments in face, car, pedestrian, and panda detection show that the resulting detectors achieve an accuracy vs. speed trade-off superior to those of previous methods.

PrePrint: Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree

Wed, 04/04/2012 - 16:35
Unexpected stimuli are a challenge to any machine learning algorithm. Here we identify distinct types of unexpected events, when general level and specific level classifiers give conflicting predictions. We define a formal framework for the representation and processing of incongruent events: starting from the notion of label hierarchy, we show how partial order on labels can be deduced from such hierarchies. For each event, we compute its probability in different ways, based on adjacent levels in the label hierarchy. An incongruent event is an event where the probability computed based on some more specific level is much smaller than the probability computed based on some more general level, leading to conflicting predictions. Algorithms are derived to detect incongruent events from different types of hierarchies, different applications and a variety of data types. We present promising results for the detection of novel visual and audio objects, and new patterns of motion in video. We also discuss the detection of Out Of Vocabulary words in speech recognition, and the detection of incongruent events in a multi modal audio-visual scenario.

IEEE Transactions on Pattern Analysis and Machine Intelligence - May 2012 (Vol. 34, No. 5)

Wed, 04/04/2012 - 16:35
IEEE Transactions on Pattern Analysis and Machine Intelligence