Image processing core library

Vision tasks pose an outstanding challenge in terms of computational effort: pixel-wise operations require high-performance architectures to achieve real-time processing. In this project, OHWR users will find multiple cores for on-chip vision-feature extraction. HDL modules are provided in different languages such as Handel-C or VHDL and applicable to various embedded and reconfigurable devices.
We offer different hardware alternatives to deal with diverse final targets including but not limited to robotics, high-performance video processing, bio-metrics, or more specifically particle tracking, analysis of fluid dynamics, 3D scene reconstruction, or object recognition.

In the library, we are providing right now components for estimating the optical flow (based on energy and phase), stereo disparity (based on energy and phase), and local energy, orientation, and phase.

The general rule for the implementation of the different modules is based on a modular design of a fine-grain pipelined and superscalar datapath to reach high performance at low working clock frequencies. The final goal is to achieve a data-throughput of one data per clock cycle. The design strategy requires the accurate definition of very deep pipelined datapaths with concurrent accesses to external memory banks. In order to facilitate the management of concurrent external memory accesses, we have used a customized memory control unit.

This fine grain pipeline strategy adopted for the core increases the maximum clock frequency by reducing the largest path delay between the flip-flops. This design strategy is radically different than the more common multi-core processing adopted in common parallel architectures such as DSPs, GPUS, and multi-core general purpose processor, where different tasks are assigned to each core in order to exploit the parallelism. Our approach for the customized FPGA architecture has been demonstrated to be efficient for data streams and image processing. The only drawback of such large pipelines is the generation of a higher latency (constant delay of the first pixel on output). However, in our case, we get a small latency of 5 image lines (order of microseconds) that is negligible if compared to the entire image (order of milliseconds). This latency is introduced only once at the beginning of the processing and the delay is maintained constant through the continuous data streaming, not affecting the real-time performance. This means that once the pipeline is filled, the circuit is able to process one pixel per clock cycle.

More details of the different modules and the strategy can be found in the respective subprojects, the papers in the documentation, and the repository.

Contact

Francisco Barranco