How Does AMD Improve AI Algorithm Hardware Efficiency?
Convolutional Neural Network(CNN) Unified Progressive Depth Pruner and AAAI 2024 Vision Transformer. Users worldwide have acknowledged AMD, one of the biggest semiconductor suppliers in the world, for its innovative chip architectural design and AI development tools. As AI advances so quickly, one of Their goals is to create high-performance algorithms that work better with AMD hardware.
Inspiration
Deep neural networks (DNNs) have achieved notable breakthroughs in a wide range of tasks, leading to impressive achievements in industrial applications. Model optimization is one of these applications that is in high demand since it can increase model inference speed while reducing accuracy trade-offs. This effort involves several methods, including effective model design, quantization, and model pruning. A common method for optimizing models in industrial applications is model trimming.
Model pruning is a major acceleration technique that aims to remove unnecessary weights intentionally while preserving accuracy. Because of sparse computation and fewer parameters, depth-wise convolutional layers provide difficulties for the traditional channel-wise pruning approach. Furthermore, channel-wise pruning techniques would make efficient models thinner and sparser, which would result in low hardware utilization and lower possible hardware efficiency.
Moreover, current model platforms favor a larger degree of parallel computation, such as GPUs. Depth Shrinker and Layer-Folding are suggested as ways to optimize MobileNetV2 in order to solve these problems by using reparameterization approaches to reduce model depth.
These techniques do have some drawbacks, though, such as the following:
- The process of fine-tuning a subnet by eliminating activation layers directly may jeopardies the integrity of baseline model weights, making it more difficult to achieve high performance.
- These techniques have usage restrictions.
- They cannot be used to prune models that have certain normalization layers, such as Layer Norm.
- Because Layer Norm is present in vision transformer models, these techniques cannot be applied to them for optimization.
Convolutional Neural Network
In order to address these issues, they suggest a depth pruning methodology that can prune Convolutional Neural Network(CNN) and vision transformer models, together with a novel block pruning method and progressive training strategy. Higher accuracy can be achieved by using the progressive training technique to transfer the baseline model structure to the subnet structure with high utilization of baseline model weights.
The current normalization layer problem can be resolved by their suggested block pruning technique, which in theory can handle all activation and normalization layers. As a result, vision transformer models can be pruned using the AMD method, which is incompatible with current depth pruning techniques.
Important Technologies
Rather than just removing the block, the AMD depth pruning approach proposes a novel block pruning strategy with reparameterization technique in an effort to reduce model depth. In block merging, the AMD block trimming technique transforms a complicated and sluggish block into a simple and fast block, as seen in Figure.
To set circumstances for reparameterization, simply replace the activation layer in a block with the identity layer, replace the Layer Norm (LN) or Group-Norm (GN) layer with a Batch Norm (BN) layer, and insert an activation layer with a Batch Norm layer at the end of the block. As seen in Figure , the reparameterization method can then combine the Batch Norm layers, nearby Convolutional Neural Network or Full-connection layers, and skip connections.
Figure : The suggested depth pruner framework by AMD. To speed up and conserve memory, each baseline block that has been pruned will progressively grow into a smaller merged block. Four baselines are tested: one vision transformer network (DeiT-Tiny) and three CNN-based networks (ResNet34, MobileNetV2, and ConvNeXtV1).
Supernet training, Subnet finding, Subnet training, and Subnet merging are the four primary phases that make up the technique. As seen in Figure , users first build a Supernet based on the basic architecture and modify blocks inside it. An ideal subnet is found via a search algorithm following Supernet training. It then use a progressive training approach that has been suggested to optimize the best Subnet with the least amount of accuracy loss. In the end, the reparameterization process would combine the Subnet into a shallower model.
Advantages
Key contributions are summarized below:
- A novel block pruning strategy using reparameterization technique.
- A progressive training strategy for subnet optimization.
- Conducting extensive experiments on both Convolutional Neural Network(CNN) and vision transformer models to showcase the superb pruning performance provided by depth pruning method.
- A unified and efficient depth pruning method for both Convolutional Neural Network(CNN) and vision transformer models.
With the AMD approach applied on ConvNeXtV1, they got three pruned ConvNeXtV1 models, which outperform popular models with similar inference performance, as illustrates, where P6 represents pruning 6 blocks of the model. Furthermore, this approach beats existing state-of-the-art methods in terms of accuracy and speedup ratio, as demonstrated. With only 1.9% top-1 accuracy reductions, the suggested depth pruner on AMD Instinct MI100 GPU accelerator achieves up to 1.26X speedup.
ConvNeXtV1 depth pruning findings on ImageNet performance. A batch size of 128 AMD Instinct MI100 GPUs is used to test speedups. Use the slowest network (EfficientFormerV2) in the table as the benchmark (1.0 speedup) for comparison.
The findings of WD-Pruning (Yu et al. 2022) and S2ViTE (Tang et al. 2022) are cited in their publication. The results of XPruner (Yu and Xiang 2023) and HVT (Pan et al. 2021), as well as SCOP (Tang et al. 2020), are not publicly available.
In summary
They have implemented this method on several Convolutional Neural Network(CNN) models and transformer models, to provide a unified depth pruner for both effective Convolutional Neural Network(CNN) and visual transformer models to prune models in the depth dimension. The benefits of this approach are demonstrated by the SOTA pruning performance. They plan to investigate the methodology on additional transformer models and workloads in the future.
0 Comments