Attention

This project presents a hardware architecture for the computation of bottom-up inherent visual attention for FPGA. The bottom-up inherent attention is generated including local energy, local orientation maps, and red-green and blue-yellow color opponencies.

Visual information enters the visual cortex and is processed along two parallel pathways: the ‘where’ pathway that involves spatial localization (dorsal stream), and the ‘what’ pathway for object recognition. Attention takes place in the first one and focuses on the most relevant areas in the scene. Particularly, it allows searching by processing the large amount of visual information that we perceive. Perception mechanisms do not merely consist in information acquisition but in active selection mechanisms of the most relevant information to deal with real-world dynamic environments.

The implemented hardware model and architecture are based on [1]. More details about the architecture can be found at [2].

Fig 1. Itti model from [1].

As it can be seen in Fig 1, the original model combines features estimated for different spatial resolutions. The maps for a specific input example are shown in Fig 2.

Fig 2. Feature maps and sorted most salient locations (see [2]).

After several normalizations, they are integrated into a unique final saliency map.

The final map can be estimated using more diverse cues. In [2] we also propose the integration of optical flow and stereo disparity. The integration of the motion can be done straightforward as the other features in the bottom-up system. Nevertheless, optical flow (as complex information of speed and direction of the motion) and disparity can be also integrated as top-down stream, in this case to modulate the final saliency map.

Saliency can be used in many different fields such as robotics, autonomous navigation, and aid devices for low-vision patients.

Contact

Javier Serrano