访问量:   最后更新时间:--

陈夕子

硕士生导师
教师姓名:陈夕子
教师英文名称:Xizi Chen
教师拼音名称:chenxizi
电子邮箱:
所在单位:信息学院
职务:专任教师
学历:博士
办公地点:华中农业大学逸夫楼C座310
学位:博士学位
职称:副研究员
在职信息:在职
主要任职:专任教师
毕业院校:香港科技大学
所属院系:信息学院
学科:计算机系统结构    计算机应用技术    
其他联系方式

通讯/办公地址:

论文成果
SubMac: Exploiting the Subword-Based Computation in RRAM-Based CNN Accelerator for Energy Saving and Speedup
发布时间:2021-09-08    点击次数:

所属单位:The Hong Kong University of Science and Technology (HKUST)

发表刊物:Integration, the VLSI Journal(CCF-C类)

项目来源:This work is partially supported by Hong Kong Research Grant Council (RGC) under Grant 619813.

关键字:Convolutional Neural Network (CNN), Resistive RAM, data encoding, dynamic quantization, computation saving

摘要:Although the CMOS-based CNN accelerators have achieved impressive progress, the memory wall issue and the high power density are still the major barriers for substantial improvement in energy efficiency and throughput. As an attractive alternative, recently the Resistive RAM-based accelerators have delivered significant breakthroughs by leveraging the in-situ computation. However, there are still some challenges, including the high computation complexity and the large energy overhead at the analog/digital interfacing circuits. In this work, we take advantage of the subword-based computation in the Resistive RAM-based accelerator to achieve energy saving and speedup. First, an encoding method is proposed for the weights and activations to reduce the energy consumption of the in-situ computation and the resolution requirement of ADC. Then the resolution of ADC is further optimized based on the distribution of the subword computation results. Furthermore, a dynamic quantization scheme is proposed to skip 67%–87% of the subword computations which outperforms the conventional quantization schemes. We fully investigate the influences of the encoding scheme and the layer-wise quantization range scaling on the performance of dynamic quantization. Finally, we demonstrate the effectiveness of the proposed algorithms under different hardware configurations and network complexities. A dedicated architecture, SubMac, is proposed to implement the above schemes. Experimental results show that the energy efficiency and the throughput are improved by 2.8–5.7 and 2.5–7.9 times, respectively, when compared with the state-of-the-art Resistive RAM-based accelerators.

备注:中国计算机学会 CCF-C 类

合写作者:Jingbo Jiang,Jingyang Zhu,Chi-Ying Tsui

第一作者:Xizi Chen

是否译文:

发表时间:2019-01-01

收录刊物:SCI