RAP - Visual Perception of Humans & Videosurveillance

The privileged applications are passive human-robot interaction (e.g., navigation in cluttered human environment), active human-robot interaction (e.g., proximal interaction through postures and gestures), and surveillance.  Privative spaces as well as large-scale human-populated environments are considered.  Functions span human detection, multiple people and posture tracking, human identification, and interpretation of human motion.  Importantly, they entail data fusion at the sensor level (of low-level cues extracted from the raw signals) or at higher levels (by combining the outputs from lower-level algorithms).

Detection and tracking have mainly been addressed via probabilistic and Monte Carlo supervised classification and filtering methods, respectively.  We have performed people detection from heterogeneous features via single-class methods such as boosting (A.A. Mekonnen’s thesis).  Multiple-class methods like ``random forest'' have been used to detect body limbs for posture recognition (L. Marti’s thesis).  As for tracking, an original fusion of vision and RFID in a reversible jump Monte Carlo Markov Chain (MCMC) particle filter enabled the real time robust navigation of a robot in a cluttered human environment [Mekonnen-et-al_CVIU2013] (A.A. Mekonnen and T. Germa’ theses).  We could track and re-identify people from a network of cameras with non-overlapping field of views, thanks to the coupling of local tracking and identification systems (one per camera, entailing a mixed-state particle filter) with a MCMC based supervisor in charge of the association of “tracklets” [Meden-et-al_BMVC2012] (B. Meden’s thesis).  Recent work has concerned the reconstruction and tracking of posture from single and multiple Kinect sensors by stochastic filtering or smoothing (L. Marti and J.T. Masse’ theses).

Visual Perception of Humans and Videosurveillance
(Top) Vision & RFID based tracking and coordinated motion of a tour-guide robot into crowds.
(Bottom) Visual tracking and re-identification of people from non-overlapping cameras.