Abstract
High computational complexity and power consumption makes Convolutional Neural Networks (CNNs) ineligible for real-time embedded applications. In this work, we introduce a low power and flexible platform as a hardware accelerator for CNNs. The proposed architecture is fully configurable by a software library so that it can perform different CNN models with a reconfigurable hardware. The hardware accelerator is evaluated on a ZC706 evaluation board. We make use of the AlexNet architecture in a real-time object recognition application to demonstrate the effectiveness of the proposed CNN accelerator. The results show that the performance rates of 198.1 GOP/s using 512 DSP blocks and 23.14 GOP/s using 64 DSP blocks are achievable for the convolution and fully connected layers, respectively. Moreover, images are processed at 82 frames per second, which is significantly higher than existing implementations.