Développement d'un système de supervision d'interactions multipartites entre des humains et un robot.
Adrien Vigne PhD defense
17.02.26 - 17.02.26
To collaborate efficiently with humans in realistic environments such as restaurants or retail stores, robots must understand what surrounding people are doing, estimate their intentions and goals, and adapt their behavior accordingly while respecting implicit social conventions. These environments present major challenges: the robot cannot continuously observe all agents (due to visual occlusions and multiple simultaneous tasks), it must recognize actions and tasks from partial observations, and it must coordinate its decisions by considering the social context and each person's role. Existing cognitive architectures do not explicitly model social practices or shared plans, and do not natively support simultaneous management of multiple people with different roles and contexts. This thesis presents an action recognition system based on semantic facts, robust to partial observations. Humans rarely perform isolated, purposeless actions: a person who picks up an object and then moves is not performing two independent actions but pursuing a goal (serving a customer, preparing a dish). The system therefore integrates hierarchical task recognition, using recognized actions to generate hypotheses about pursued goals, even when some steps are not observed. These recognitions only make sense within a given social context: in a restaurant, understanding that one person is a waiter and another is a customer radically changes the interpretation of their actions. To represent such social contexts, we introduce Practice Frames to model social practices. Practice Frames explicitly define each participant's roles, capabilities, and mutual or unilateral expectations, enabling dynamic role attribution according to context. When an expectation is detected, the system triggers appropriate actions based on available capabilities and identified needs. To coordinate collaboration, we build upon SharedPlans representations, which describe who does what and distinguish between what is known and what remains to be done, allowing tracking of collaborative execution. The entire approach relies on a unified knowledge representation ensuring coherent integration of these components. The architecture supports simultaneous management of multiple people: the system can recognize actions and tasks of several individuals at once, dynamically assign social roles (waiter, customer, cook, assistant) according to context, and maintain an individual context for each person (current role, ongoing actions, location). Context management operates at two levels: social context (roles, mutual expectations, social practices, norms) and situational context (agent positions, environment state, available objects, ongoing tasks). The system is based on a knowledge base combining conceptual information (objects, roles, relationships) and temporal information (events, action sequences, per-person histories), updated in real time. These recognitions feed a supervision system that detects context changes, manages priorities according to active roles and practices, coordinates mission execution, and dynamically adapts the robot's behavior to social situations. The action and task recognition systems have been tested on robotic platforms. The full architecture is functional and has been validated in simulation on multiple scenarios. This thesis proposes both a theoretical framework and a practical implementation of a robotic architecture enabling interaction management with multiple humans, taking into account both social and situational contexts. Robust recognition under partial observation, combined with context management, allows the robot to handle multiple people with different roles in realistic environments. This work provides a framework for collaborative social robotics, applicable to service environments and domestic assistants.
published on 11.02.26