|
Rationale
Objectives
Approach
Workplan
Rationale
Humans act in space. Sometimes, interactions in space are explicit, as we point, reach or grasp the things around us. Other interactions are implicit, an awareness of where we are and what things surround us. In general, to interact effectively with the environment, it can be argued that humans might use complex motion strategies at ocular level (but possibly extended to other body parts, e.g. head and arms, so possibly using multimodal feedback), to extract information useful to build representations of the 3D space which are coherent and stable with respect to time.
Such representations rely on a multisensory active exploration of the environment. In particular, purposive (active) vision is an important source of information and provides a number of cues about the 3D layout of objects in a scene that could be used for planning and controlling goal-directed behaviors.
Although computer and robot vision are today making technological progress, the current active vision solutions are increasingly driven by applications like robotic manipulation or surveillance and are still far from reaching a real purposive behavior. More precisely, research in robot vision is more geared towards active “looking at” scenes and objects rather that “seeing”, gathering the visual data and having to act in a limited time and space.
“Seeing” is something we do rather than a sequence of hierarchical interpretative processes. From this perspective, the experience of “seeing” in not necessarily generated, but it express itself in the behavior.
Following these premises, we will study the topic of seeing using an embodied artificial intelligence by considering the cognitive valence of an early perception-action embodiment in the visual system.
The necessity of an embodiment to achieve intelligent behavior is not a novel issue in itself (e.g., the “enactivism ” paradigm). Yet, in general, in active vision systems, the perception-action loop closes at a “system level” (by decoupling de facto the vision modules from those dedicated for motor control and motor planning), and the exploitation of the computational (feedback) effects of voluntary explorative eye movements on the visual processes are very rare in artificial artifacts. On the contrary, a wide number of neurophysiological experiments report of modulatory effects of motor and premotor signals on the visual receptive fields across several cortical areas (e.g. the gain fields), postulating their role in gaining a perceptual visuospatial awareness in head-centred coordinates for visually guided actions in the peripersonal space. Such motor components, actively contribute to stabilize/improve the 3D perception of space, but also allow us to achieve a global awareness by enabling/establishing loose links among visual fragments of the observed scene (global spatial reference). Such links evolve in a dynamic way, are influenced by attentive processes and memory, and are task-contingent (action/goal-oriented).
From this perspective, the concept of an active fragmented vision represents a dynamic cognitive interpretation of the scene, which does not imply a real metrical 3-D reconstruction of the observed space, but instead a loose representation of objects that are actively bound on time for the task at hand (in terms of affordance, salience, and planning of actions).
[top]
Objectives
The goal of the project is thus to develop a perceptual agent capable of achieving a full 3D awareness for interaction control/planning in the peripersonal space. The sophistication of perceptual capabilities will ultimately be measured in terms of their value to the agent in executing its tasks.ů
The final awareness will derive by building a knowledge of the sensorimotor laws that drive the relation between possible actions and the resulting changes in incoming visual information [O’Regan and Noe, 2001], and by the integration of this knowledge into planning behavior.
Two important intertwined questions will be raised, which actually define a next big step in the development of artificial perceptual agents (autonomous robotic agents):
1) How to design a robot vision system that, through intentional (i.e., voluntary) eye movements, is able to “see”, not only of being able “to look at” saliencies?
2) How can the effect of active eye movements and of arm reaching actions be expressed as joint visuo-motor features, patterns and relationships for a perceptual awareness of space?
The project addresses these problems at different levels, integrating the contributions coming from different disciplines (engineering, neuro-physiology, and psychology). At the lowest level investigation is focused to understand the mechanisms of depth vision based on / intertwined with eye motor control. At higher level, the project aims at allowing an artificial intelligent system to master the peripersonal space in which it is embedded and the objects present in it.
Specifically the EYESHOTS project will address three Principal Objectives:
Objective 1: Development of a robotic system for interactive visual stereopsis.
A first objective concerns the development of a robotic system composed of: (1) an anthropomorphic mechatronic binocular system, and (2) software vision modules based on cortical-like population of disparity detectors with different characteristics for foveal and parafoeveal representations of the visual field, to be used as an experimental platform. The function of the system is to interactively explore the 3-D space by active foveations. Benefits of the motor side of depth vision are expected to be bidirectional by learning optimal sensorimotor interactions: on one hand the system learns to see in 3-D through eye movements, and, on the other the system learns coordinated binocular eye movements in 3-D through vision. Binocular alignment and stereo matching is favored by the structural paradigms of the binocular eye coordination. Fast (real-time) binocular fusion around the fixation point is achieved by dynamically adjusting the response of disparity detectors.
Objective 2: Development of a model of a multisensory egocentric representation of the 3D space.
The representation is constructed on (1) binocular visual cues, (2) signals from the oculomotor systems (position of the eyes), (3) signals about reaching movements performed by the arm.
Egocentric representations require regular updating as the robot changes its fixation point. Rather than continuously updating based on motor cues or a visual mechanism (i.e. optic flow), the model updates only the egocentric relationship and object-to-object relationships of those objects currently in the field of view. During motion, the model covertly and overtly shifts attention to objects in the environment to maintain the model’s current awareness of the environment. The updating of the internal representation of spatial relations requires binding processes across the different visual fragments. Spatial awareness of the environment provides the model with the capability to interact with 3D environments. The model can maintain awareness of objects and visual features as the robot moves its eyes in the 3D space. The model can encode and update the 3D spatial location of objects and if the model needs to view an object outside of the current field of view, the model can request a saccade to a remembered spatial location. The model is also able to request arm movements to reach spatial locations in the peripersonal space to interact with objects in the 3D environment.
Objective 3: Development of a model of human-robot cooperative actions in a shared workspace.
By the mechanism of shared attention the robot will be able to track a human partner’s overt attention and predict and react to the partner’s actions. This will be extremely helpful in cooperative interactions between the robot and a human.
[top]
Approach
The approach proposed by EYESHOTS project follows the development of four Key Research Actions (KRAs), which have been identified according to the overall project’s goal:
(1) Constructing visual perception of space by interactive stereopsis.
(2) Recursive modulation of perception across visual fragments.
(3) Visuospatial awareness and planning behavior
(4) Human behavior replicas by integration/interactive paradigms.
Figure 1: Development of the Key Research Actions
KRA 1: Constructing visual perception of space by interactive stereopsis.
Vision is the first source of information about the 3D space. The search for optimal visuomotor coordination to achieve robust and stable percepts does pose a major challenge. KRA1 will focus on:
Anthropomorphic robotics eye system.
Static and dynamic disparity detectors.
Learning disparity detectors.
Stereoscopic object recognition.
It provides an input to KRA2 contributing to the definition of a visual fragment of the observed scene.
KRA 2: Recursive modulation of perception across visual fragments
Definition of a strategy to achieve a global perception of the 3D spatial relations and relative 3D motion for controlling spatially directed actions (e.g., reaching), and, in general, visually-guided goal-directed movements in the whole peripersonal workspace. KRA2 will focus on:
An attentional-based section of visual fragments.
A construction of peripersonal space across eye movements.
The output of this KRA is not a real 3D reconstruction of the scene but a loose coupling among fragments providing awareness of the objects in the scene and information on how the system can interact with them. Only minimal descriptors for performing the required task will be recruited.
KRA 3: Visuospatial awareness and planning behavior
Here we address the problem of constructing an action-minded representation of the 3D space.
This will be achieved through a multisensory description of 3D space obtained through active ocular and arm movements.
KRA3 will contribute to:
- The definition of joint representation signals of eyes and hand movements in a 3D extrinsic coordinate frame, on which to base the 3D location of a visual target with respect to a point on the body surface. We expect several advantages from such combined representation, respect to computing from signal intrinsic to each system such as version/vergence or joint-angle signals.
- The definition of shared attention behaviour in common workspaces.
KRA 4: Human behavior replicas by integration/interactive paradigms.
This is a technical KRA concerned in the “translation” of the scientific achievements of KRAs1-3 into operative modules/subsystems (hw/sw robotic systems) characterized by perceptual/cognitive capabilities that emulate the human behaviour.
Two different subsystems will be considered:
- Binocular eye system [an anthropomorphic robotic vision head (mobile eyes, fixed neck)]
- Visually-based reaching system [a robotic arm and a stereo vision system]
This KRA will combine the components and the computational paradigms developed in KRAs1-3: the design, testing, and comparative analysis of performances of these systems are crucial for validating the approach.
Specifications for the testing activities will be provided by the development of the other KRAs: the experiments with robots will be a follow-up of the biological experiments. The psychophysical experiments will provide behavioural patterns (I/O specifications), while in-vivo experiments will provide architectural solutions (I/O + internal structural data). In order to meet the concurrent development stages of the research activities of the Workplan we will pursue a dynamic approach adapting the final integration plan in the course of the project to the actual achievements. Initially, tests will be conducted on the different subsystems.
[top]
Workplan
The Work Program consists of 8 Work-packages (WPs). There will be five scientific and technological WPs (WP1-5), and three WPs, planned for: training (WP8), dissemination and exploitation of the project’s results (WP7), and for general project coordination and management (WP6).
|