Visual Human Detection : ICU

The genom module ICU (for "I see You") is launched during interaction at long (>3 meters) and medium ([1,3] meters) range H/R distance. LAAS has investigated on vision-based modalities dedicated to detection/tracking of persons, and eigenfaces-based recognition modality in order to (re)-identify persons during any proximal H/R interaction session. The functions provided by the module ICU are organized into three broad categories: (1) functions related to human body limbs detection, (2) functions related to user face recognition, (3) functions related to user tracking.

1- Human body limbs detection

In order to interact with the user of Jido / Rackham, ICU need to detect and extract some cues related to the human body limbs. Three main detectors are used.

(1) Face detector
For detecting faces, we apply the well known window scanning technique introduced by Viola et al. which covers a range of ±45 ° out-of plane rotation. This detector is based on a boosted cascade of Haar-like features.

(2) Skin colored blobs detector
Detecting skin-coloured pixels provides a reliable method for detecting human faces or hands. Classically, the ratio of the two histograms modeling respectively the skin and background distributions makes it possible to apply Bayes rule to each image pixel in order to obtain its skin-coloured probability.

(3) Motion blobs detector
The motion blob detector is based on the difference between successive pairs of frames. It can coarsely outline a mobile ROI that can be integrated in our tracking loop.

2- User face recognition

The face classification is splitted in two parts : the representation of the faces and the decision-making process.
The representation of the faces is based on the eigenfaces in order to reduce the size of the data. Each person is represented by an eigenface base. Then, the decision is done by a simple Bayesian rules based on our error norm inspired from the Distance From Face Space (DFFS).

3- User tracking

As described previously, ICU must be able to follow the targeted person at different H/R distances. To deal with these ranges (from 1 to 3 meters, more than 3 meters), ICU embed two different tracker based on particle filtering : the upper human body tracker and the motion monitoring. The first one follows the person interacting with the robot at medium range and use the face recognition to know if the user in known. The second is launched when the user is to far for direct interaction (more than 3 meters) and is mostly based on motion detection. These modalities are managed by a finite state automaton that automatically switch between the appropriate modality (Motion monitoring, Upper human body tracking, Face detection).

Tour Guide Robot video

related publications

Data Fusion and Eigenface based Tracking dedicated to a Tour-Guide Robot T. Germa, L. Brčthes, F. Lerasle, T. Simon., Int. Conf. on Vision Systems (ICVS'07). March, 2007. Bielefeld, Deutschland.

Human/Robot Visual Interaction for a Tour-Guide Robot. T. Germa, F.Lerasle, P.Danes, L.Brethes.,. IEEE/RSJ International Conference on Intelligent Robots and Systems, IROSŐ07, San Diego, California, USA, November 2007.

Visual Human Detection : GEST

The GenoM module GEST is used during the medium (gesture tracking) and proximal interaction (object grasping) sessions ([1 - 3] m). Its aim is to track in 3D the upper human body parts i.e. the two-hands and the head of the robot interlocutor. Our multiple object tracker (MOT) uses distributed filters which suffers from the well-known error merge and labeling problems when targets undergo partial or complete occlusions. We propose an interactively distributed MOT (IDMOT) based on particle filtering framework. When targets do not interact on each other, the approach performs like multiple independant trackers. When they are in close proximity, magnetic and intertia likelihoods are added in each filter to handle the aforementioned problems. Our IDMOT particle filter is improved and extended in two ways. First, the conventional CONDENSATION strategy is replaced by the genuine ICONDENSATION which permits automatic (re) initialization. Secondly, our importance and measurement functions are based on a robust and probabilistically motivated integration of multiple cues. Fusing 3D and 2D (image-based) information from the video stream of a stereo-head enables to benefit both from reconstruction-based and appearance-based approaches.

Images show snapshots of a typical run involving sporadic disappearances of the hands. For each frame, the templates depict the projections of the estimated targets (head and hands).

Current investigations concern two-handed gesture interpretation based on 3-state HMM. The observations are derived from the tracked head and hands positions. We focus on deictic gestures and a small set of symbolic gestures.

related publications

Multimodal Interaction Abilities for a Robot Companion B. Burger and I. Ferrane and F. Lerasle Int. Conf. on Computer Vision Systems ICVS'08

Laser Based Human Detection

A laser based Human Positioning has been implemented in the humPos module: As the robot's field of view is narrow and the range of human detection by ICU (see above section on visual detection) service is limited, laser based detection is specifically created to produce an input on humans' positions and orientations in the environment for the human aware motion planner ( more .. ). The human detection by humPos is made in three stages: Leg detection in aspect: Segments, that do not exist in the environment, obtained from aspect service are used for the recognition of possible legs. All segments that pass through this filter are tagged as possible humans with an associated confidence value. Filtering by using sick: Groups of points obtained directly from laser have a number of properties and shapes if the points indeed belong to a leg. This stage of the process applies this filter directly to the previous stage's possible humans' raw laser data. At the output of this stage, the shapes tagged to be humans have a higher probability to be so. Detection: If the potential humans are also detected by the vision based service ICU, their associated confidence level humans increases dramatically. Another property that has effects on humPos is the motion of humans in the environment. If a shape is not moving for a predefined time, it is thrown away and tagged to be not human even though it had a high probability. On the contrary a shape that moves in the laser map, has a higher probability to be a human. Detected humans along with their positions, orientations and their probability are transmitted to the NHP Service. This service is implemented in C as a module of the LAAS architecture.