论文简介 |
Abstract—To reduce multiplication operations in convolutionof convolutional neural networks (CNNs), there are three widely used convolutional acceleration algorithms, i.e., Winograd, FFT and FFA. However, current accelerators based on these convo-lutional acceleration algorithms have issues on flexibility andefficiency. Firstly, some accelerators utilized a combination of these acceleration algorithms and employed multiple types of computational units to achieve their respective advantages. As a result, some computational units are left unused when the best-performing unit is working, which causes much area inefficiency.Secondly, current accelerators tend to choose small parameters of these convolutional acceleration algorithms to avoid unacceptable precision loss, as a result, they are hardly to support large kernel sizes and lack of flexibility. Thirdly, these acceleration algorithms are typically presented for 1-stride convolutions, consequently, few implementation considers the acceleration of large-stride convolutions, which is a major restriction to hardware flexibility.This paper proposed a stride-based convolution decomposition method (SCDM) to reform different convolution shapes (i.e.,kernel sizes & strides) to an identical pattern. With the aid of SCDM, a Winograd-stretched and hardware-efficient design
(WHD) is presented to utilize one uniform computational unit for the acceleration of different convolution shapes, which com-bines complementary performance advantages on both Winograd F(4,3)andF(4,2)units. Compared to current FFT-based or FFA-based works, WHD can stretch the use range of Winograd and simplify implementation, thereby achieving hardware flexibility and efficiency. Evaluation results show that 34.08%∼55.41% operation reduction were achieved on six CNN models, while incurring a slight hardware overhead.Index Terms—Convolutional neural networks, acceleration algorithm, convolution decomposition, flexibility, hardware-efficient. |