Counterclockwise block-by-block knowledge distillation for neural network compression

Abstract Model compression is a technique for transforming large neural network models into smaller ones.Knowledge distillation (KD) is a crucial model compression technique that involves transferring knowledge from a large teacher model to a lightweight student model.Existing knowledge distillation methods typically facilitate the knowledge transfer from teacher to student models in one or two stages.This paper introduces a novel approach called counterclockwise block-wise knowledge distillation (CBKD) to optimize the knowledge distillation process.The core idea of CBKD aims to mitigate weboost splitter the generation gap between teacher and student models, facilitating the transmission of intermediate-layer knowledge from the teacher model.

It divides both teacher and student models into multiple sub-network blocks, and in each stage of knowledge distillation, only the knowledge from one teacher sub-block is transferred click here to the corresponding position of a student sub-block.Additionally, in the CBKD process, deeper teacher sub-network blocks are assigned higher compression rates.Extensive experiments on tiny-imagenet-200 and CIFAR-10 demonstrate that the proposed CBKD method can enhance the distillation performance of various mainstream knowledge distillation approaches.

Leave a Reply

Your email address will not be published. Required fields are marked *