An Energy-Efficient Near-Memory Computing Architecture for CNN Inference at Cache Level
A non-von Neumann Near-Memory Computing architecture, optimized for CNN inference in edge computing, is integrated in the cache memory sub-system of a microcontroller unit. The NMC co-processor is evaluated using an 8-bit fixed-point quantized CNN model, and achieves an accuracy of 98% on the MNIST dataset. A full inference of the CNN model executed on the NMC processor, demonstrates an improvemen
