RAP - Robot Audition
Detection and localization of multiple broadband sources (e.g., speech) have been studied, with microphone arrays or binaural heads (two microphones).
By mixing modal analysis and convex optimization arguments, we instantiated a coherent beamspace MUSIC (MUltiple SIgnal Classification) strategy for uniform linear arrays. In conjunction with our MAICE (Minimum Akaike Information Criterion Estimate) of the number of active sources, up to 3 sources can be detected and localized at reduced cost [DanesBonnal_IROS2010].
Then, we entered the field of binaural “active” audition [Argentieri_BlauertBook2012] (with ISIR, Paris). The aim is to exploit a moving binaural head in order to overcome the limitations that occur in the static case (front-back ambiguity, distance non-observability…). Our approach involves three steps: detection of the sources activity and estimation of their spatial arrangement by binaural processing over small time snippets (“short-term detection”); fusion of these data with the sensor motor commands into a stochastic filtering scheme (“active/audio-motor localization”); feedback control of the sensor motion so as to improve localization (“active/information-based motion”). The first two steps were solved in the single source case by: a short-term maximum likelihood estimator of the source direction coping with head induced scattering; an information-theoretic source activity detector; a Gaussian sum unscented Kalman filter endowed with self-initialization, consistency, false measurements and source intermittence handling [Portello_IROS2012] [Portello_IROS2013] (A. Portello’s thesis). An Expectation-Maximization extension of multiple sources short-term detection has also been obtained [Portello_IROS2014].
An ongoing study concerns audio-motor localization of multiple sources and information-based motion (G. Bustamante’s thesis). At the confluent of visual human perception and binaural audition, the visio-auditive detection and identification of humans from the humanoid robot ROMEO in a proximal interaction context has also been investigated, together with the spatiotemporal analysis of their behaviors (L. Fitte-Duval’s thesis).
Active localization of a speech utterance from a binaural head (real data, in nearly acoustic room)