Doubly-Block Circulant Kernel Matrix Exploitation in Convolutional Accelerators
In this paper, we present a novel algorithmic and hardware co-design approach specifically tailored for efficient 2D convolution implementations, a crucial operation in convolutional neural networks (CNNs). Our method addresses the limitations of existing software-based solutions and hardware-based architectures, delivering significant improvements in asymptotic behavior for generic convolution ca