The project aimed to reach the following three specific goals:
Goal 1: Development of a robotic system for interactive visual stereopsis – The function of the systems is to interactively explore the 3D space by active foveations. Benefits of the motor side of depth vision are expected to be bi-directional by learning optimal sensorimotor interactions.
Goal 2: Development of a model of an egocentric representation of the 3D space – The representation is constructed on (1) binocular visual cues, (2) signals from the oculomotor systems, (3) signals about reaching movements performed by the arm. Egocentric representations require regular updating as the robot changes its fixation point. Rather than continuously updating based on motor cues or a visual mechanism (i.e. optic flow), the model updates only the egocentric relationship and object-to-object relationships of those objects currently in the field of view. During motion, the model covertly and overtly shifts attention to objects in the environment to maintain the model’s current awareness of the environment. The updating of the internal representation of spatial relations requires binding processes across the different visual fragments.
Goal 3: Development of a model of human-robot cooperative actions in a shared workspace – By the mechanism of shared attention the robot will be able to track a human partner’s overt attention and predict and react to the partner’s actions. This will be extremely helpful in cooperative interactions between the robot and a human.
The project addressed these problems at different levels, integrating the contributions coming from different disciplines (engineering, neuro-physiology, and psychology).
To focus the scientific problem, the following simplification assumptions were introduced: (1) objects in the scene are static (except the “arm”, as part of the system, moving in the workspace). (2) The head is fixed. We rarely shift our gaze without moving our heads, nonetheless we introduced this restrictive assumption to isolate a particular function and to have a common setting with the planned neurophysiological experiments. (3) Only saccade-type eye-movements are considered. This have not hampered the behavioral tasks tackled in this research. Indeed, there are rarely pursuit movements to the motion of one’s own arm. Usually, when reaching for a target, gaze is directed to the target and the motion of the arm is controlled from the visual motion of the arm in the peripheral visual field.
Having in mind these objectives, the project activities articulated in the concurrent development of the following major themes:
Design (and control) of an anthropomorphic mechatronic binocular system
The starting point of the project was to complete the understanding of the biomechanics of the ocular motions in humans and primates and to transfer these results into guidelines for the design of robotic eyes which could provide different solutions for the implementation of humanoid robots.
Beside the specific robotic applications, it is assumed that the implementation of a bio-inspired robot eye (or robot head) is also the starting point for the analysis and the assessment of the motor control strategies implemented by the brain to drive the very high dynamics of ocular rotation.
In this sense it was, and it is still considered, a key feature of EYESHOTS the target of developing a prototype of robot eye featuring bio-inspired concept and design, which are strongly different from the conventional stiff pan-tilt platforms. The basic idea is that emulating [ocular motions] is different from simulating [them]. In other terms, it is possible with a conventional robot system to obtain a desired target behavior by constraining it at control level; it is however, in general, not possible to achieve the same behavior as an emerging one due to the implicit characteristics of the plant.
It is then reasonable to assess, from an engineering point of view that state-of-the-art conventional stiff robots can guarantee high accuracy and (reasonably) high speed, but they cannot allow us to perform experiments where the motion characteristics arise from the intimate nature of the mechanics of the plant itself.
Early perception-action cycles in binocular vision
According to the current trends of active, purposive vision systems, the motor system of a humanoid robot should be an integral part of its perceptual machinery. Traditionally, however, in robot vision systems, perception-action loops close at a “system level” (by decoupling de facto the vision modules from those dedicated for motor control and motor planning), and the exploitation of the computational effects of the eye movements on the visual processes are very seldom in artificial artifacts.
The limit of the approach was that solving specific high-level tasks usually requires sensory-motor shortcuts at the system level, and specific knowledge-based rules or heuristic algorithms have to be included to establish behavioural consistency relationships among the extracted perceptual features and the desired actions. The risk is to abandon distributed representations of multiple solutions to prematurely construct integrated description of cognitive entities and commit the system to a particular behaviour.
Conversely, our claim was that early/complex interactions between vision and motor control are crucial in determining the effective performance of an active binocular vision system with a minimal amount of resources and coping with uncertainties and inaccuracies of real systems. The complexity of integrating efficiently and with adequate flexibility the different aspects of binocular active vision prevented, indeed, till now a full validation of the visuomotor approaches to 3D perception in real world situations. The research moved from the belief that the advantages of binocular visuomotor strategies could be fully understood only if one jointly analyzes and models the problem of neural computation of stereo information, and if one takes into account the limited accuracy of the motor system. Defending an early visuomotor approach to 3D perception, we looked for instantiations of visuomotor optimization principles concurrently with the design of distributed neural models/architectures that can efficiently embody them.
Developing cognitive vision modules and oculomotor control strategies for fixating selected and memorized targets
The ability to share a peripersonal workspace strongly relies on the processing of visual information, on a working memory, and on appropriately acting to the visual input. Moreover, vision has to be an active process and under high level cognitive control (humans normally expect a certain object to appear in their visual field given that they move their eyes). Our approach for object recognition was based on the concept of visual attention (Hamker, 2005a), which describes the ability of the visual cortex to focus processing resources on a certain object or visual fragment (feature-based attention) or on a certain location (spatial attention). The transition of feature-based attention into a spatially selective attention signal should be achieved by an oculomotor loop via the frontal eye field (FEF) which selects the location of a particular object for a saccadic eye movement.
Learning was expected to be a very crucial property for flexible and adaptive object recognition. The learning of appropriate receptive fields occurs indeed in the whole hierarchy of the visual stream: in early visual areas (e.g. V1), it is possible to learn simple visual fragments (e.g. edges) so that higher visual area neurons can learn to represent a single view of an object.
Finally, vision should be accompanied by executive functions that allow for a selection between different behavioral alternatives as well as to use and hold previously visible information in working memory (WM) if beneficial for the task. These functions would relate to the development of a model of cortical-basal-ganglia-loops. This model should learn the target object for a particular action – the object which will lead to reward. Knowledge about the target object can then be used to guide attention or to maintain object information in WM and use it together with the present visual input for a final decision about motor action.
Neurophysiological evidences of joint visuomotor descriptors of the 3D space in the parietal cortex
In order to look to or reach for objects of interest, information about the site of retinal activity must be combined with information about eye (and head) position. Thus, the twin issues of spatial representation and coordinate transformation within the central visual pathways have been the subject of vigorous investigation in the recent years and are eminently deserving of continued study. Considerable progress has been made in understanding the role of the parietal and frontal cortices in these computations, but most of the studies have been performed with targets located in frontal plane (parallel to the eye’ plane). Our objective in EYESHOTS was to analyze the functioning of these systems in more natural situations, in which the animal look or reach for a stationary object located in positions in depth in the peripersonal space. These experiments are technically difficult since they need to be conducted in a more natural or realistic setting, while still maintaining appropriate experimental control.
Neurophysiological experiments had to be conducted in the medial parieto-occipital cortex, located in the caudal part of the superior parietal lobule. This is a very crucial node of brain, at the boundary between areas that analyse information on passive sensory modalities and areas involved in coordinating active eye- and arm-movements (Galletti et al., 2003). The selected cortical area contains neurons responsive to visual stimuli (Galletti et al., 1996; Galletti et al., 1999), as well as cells modulated by somatosensory inputs, mainly from the upper limbs (Breveglieri et al., 2002), and arm movement-related neurons (Galletti et al., 1997; Fattori et al., 2001). Other previous works demonstrated that the monkey medial parieto-occipital cortex is involved in elaborating eye movement signals both for fixation (Galletti et al., 1995), for saccadic eye movements (Galletti et al., 1995; Kutz et al., 2003), and for reaching movements (Marzocchi et al., 2008). In EYESHOTS we aimed at studying whether this crucial node within the dorsal visual stream (Goodale and Milner, 1992) encodes target locations in 3D, through the concurrent use of information on vergence eye movements and arm movements in depth. This characterization wanted to provide indications on the role of non-visual cues (such as the eyes' version and vergence angles) on the perception of the 3D space as well as the role of visual cues in mastering the 3D peripersonal space.
Psychophysical evidences of motor-contingency of the peripersonal sensory space
We hypothesized that motor signals used for saccade execution are also used for the awareness of the spatial locations of the sequence of visual fragments, and that, because these motor signals are plastic, perceptual awareness of peripersonal space is dynamic as well, in the sense that it can be adapted to the motor parameters. Using saccadic adaptation as an experimental paradigm we wanted to study the shaping of perceptual space by sensorimotor contingencies. A motor-contingency of peripersonal sensory space would allow a direct mapping between a fragment’s location and the motor command to reach the fragment by an eye movement. Such a direct mapping would ease motor planning and would allow quicker goal-directed movements.
Integration of a bio-inspired sensorimotor representation of the reachable space on an anthropomorphic robot platform
A fundamental contribution expected from EYESHOTS was to take inspiration from neurophysiological and psychophysical findings in order to design and implement on a humanoid robot a model for achieving visuomotor awareness of the environment by using eye and arm movements.
Starting from an integrated object representation which includes cognitive and visuomotor aspects of surrounding stimuli, the artificial agent (either a real robot or a simulation) was expected to be able to interact with stimuli in its peripersonal space by performing an active exploration of it. Such exploration would allow the agent to: 1) learn to coordinate and associate visual stimuli with oculomotor and arm motor movements; 2) build a visuomotor representation of potential targets in its peripersonal space.
Taking as main inspiration the data and insights provided by the experiments performed by partners UNIBO, to which UJI was also expected to collaborate, the model would implement, in a biologically plausible fashion, the sensorimotor transformations performed by the primate posterior parietal cortex, with special emphasis on the role of the dorso-medial stream, and namely on area V6A.
The outcome of applying such a model on a real humanoid robotic platform should be a set of basic skills, such as concurrent or decoupled gazing and reaching movements toward visual stimuli. The robot was planned to be able to show its visuomotor capabilities by performing oculomotor actions toward visual targets placed in its peripersonal space, or toward the location of its hand. Moreover, it should also able to perform arm reaching movements to visible objects, either with or without gazing at them.
Predicting behaviour and cooperation in shared workspace
Understanding the sequence of allocation of attention, direction of gaze, and movement of the arm of a human cooperation partner is the basis for effective human-robot interaction. To this end, the shared attention concept has been used as a specific paradigm to understand the intention or goal of a communicating partner. Specific experiments were conducted to monitor the overt allocation of attention by using eye tracking and knowledge about the contingencies of the shared task in order to disentangle the respective contribution of task-knowledge and internal action planning versus observational information from the partner (i.e., the partner’s eye movements or other actions). In particular, three questions have been addressed to study behavior prediction and cooperation in shared workspace:
The following conclusions have been derived:
- Can human use others’ gaze direction to predict actions?
- How do humans identify relevant gaze shifts?
- What is the gain of a cooperative behavior?
- Other’s gaze direction can be used advantageously as a predictive cue about the final location of a pointing movement and can be complemented by the kinematic cues provided by the hand movement.
- When gaze and hand cues were presented in conjunction we found a larger proportion of saccades following the gaze direction.
- The stereotypical gaze behavior seems necessary to establish a closed loop between two participants that allows a coordinated fine-tuning of the joint action.
- When both partners jointly cooperate less attentional resources are necessary for an optimal performance of the interaction.