RAP - Perception on Objects and on the Environment

Vision based Object 3D Modeling, Detection, Segmentation and Recognition

The main application is vision based manipulation, e.g., for human-robot manipulation in domotics or industry.

We have developed a hierarchical approach to appearance-based recognition in order to efficiently focus on the most likely classes (G. Manfredi’s thesis).

We collaborated with ICA-Albi on Non Destructive Testing.  First, we studied the inspection of large aircraft parts from a static system made of up to 8 cameras and illuminators (so as to reconstruct untextured objects).  A fine and dense stereovision algorithm was extended to process more than two images while preserving edges, and led to a 0.05mm error standard deviation [Harvent_MVA2013] (J. Harvent’s thesis).  Then, a dedicated version was integrated by the NOOMEO start-up.  With NOOMEO, we performed object 3D modeling on the basis of the dense 3D images acquired by their hand-held sensor (composed of two cameras, an illuminator and an IMU) [Coudrin_Optics&Laser2011] (B. Coudrin’s thesis).  Several variants of the ICP algorithm were evaluated in order to get a 3D accuracy around 0,1mm.  Last, we contributed to thermal metrology on any convex 3D object.  This requires, for each mesh of the shape, the orientations relative to the camera and the radiative properties.  The 3D reconstruction was retrieved by an uncalibrated stereo NIR camera rig mounted on a cartesian robot.  We proposed an uncalibrated rectification algorithm, and a self-calibration approach for the hand-eye transform (B. Ducarouge’s thesis).

An ongoing work aims at segmenting and tracking multiple static or moving objects for biological process control or food quality assessment.  Statistical, filtering and variational methods have been combined to highlight the objects to be recognized, characterized and tracked (S. Larnier’s post-doc & P. Dubosclard’s thesis).  Other ongoing studies are devoted to the transverse topic of joint perception on humans and objects.

Self-localization of a Mobile Entity from Embedded Sensors

An application can be outdoor robotics, e.g., navigation or automatic inspection with a mobile robot.  During the past period, monocular and stereoscopic SLAM methods based on point landmarks and Extended Kalman Filter (EKF) were designed and validated.  In collaboration with RIS and GEPETTO, a real time version (named RT-SLAM) was developed and validated on robots equipped with cameras, IMU and natural GPS.  RT-SLAM is now open-sourced and used by other labs.  Within RAP, it was used to learn and replay a trajectory, along two scenarios: either the replay task is posterior to learning, allowing changes in the environment, or the learn-replay tasks are executed separately by the leader and the followers of a convoy (D. Marquez’s thesis).

The initial method was extended to process line landmarks, considering the undelayed initialization of a line as soon as it is detected.  Several representations for point or line landmarks were compared (Solà & Ila Post-docs) [Solà_IJCV2012].

The point landmarks based EKF-SLAM was re-coded so as to comply with aircraft embedded subsystems rules.  The resulting C-SLAM was validated on multispectral sequences, enabling self-localization by night or under bad weather (A. Gonzalez’s thesis).  To overcome EKF-SLAM’s limitations, a work funded by SATT Toulouse Tech Transfer has targeted an optimization-based SLAM.  A RGB-D based prototype is under evaluation.

Last, we developed with an industrial partner an EKF localization of visually impaired pedestrian in urban environments, by fusing GPS (prone to canyon effect), accelerometer, gyroscope and 3D magnetometer.  The prior dynamics relies on the pedestrian velocity vs step frequency relationship, and captures walking-stop-transport transitions.  The solution is commercialized, and reaches a <5m localization error 95 percent of time.

Mobility in Dynamic Environments: Obstacle Detection, Identification, Tracking

Visual self-localization is based on static landmarks.  Dynamic environments require the detection of mobile objects (e.g., obstacles).  Restricting to monocular vision, obstacles were detected from sparse optical flow analysis.  Tracklets were first extracted from a short image sequence.  Then, static points (made available for SLAM) were discriminated from dynamic ones (associated to obstacles) by a contrario reasoning.  Dynamic points were clustered to make explicit each mobile object model.  Then, every detected object was tracked by an active strategy to make its model denser.  Object tracking was performed either using KLT on points, or by a snake initialized from these points.

As monocular vision did not enable to unambiguously estimate the obstacle position and velocity, a stereovision approach was proposed, using separate MOT (Mobile Object Tracking) for each detected object.  In order to avoid false detections and robustify the SLAMMOT, an appearance-based detection relying on prior learning was proposed to identity the object classes (pedestrian, vehicle…) (D. Marquez’s thesis).

Perception on Objects and on the Environment
Active visual-based detection and tracking of moving objects (SLAMMOT)