Matrix Solver is Accelerated by oneAPI and SYCL Using a Common Codebase for Multivendor GPUs.
A SYCL implementation of the MuST (Multiple Scattering Theory) framework was presented at SC23 by Fengrui Zhang (Intel), Shiquan Su (Intel), Xiao Zhu (University of Washington), Xiao Liang (Pittsburgh Supercomputer Centre), and Yang Wang (Pittsburgh Supercomputer Centre).
A very accurate and efficient numerical method for studying and simulating quantum processes in randomly distributed, locally self-consistent, but essentially disordered systems is offered by the Multiple Scattering Theory framework. A matrix solver, originally built in Fortran, is at the heart of its computationally complex computations.
Using Intel oneAPI tools and the SYCL programming framework, the CPU-specific Fortran code’s most time-consuming step the matrix inversion step was accelerated dramatically. By switching to a single, optimized SYCL codebase, you can take use of the processing power offered by GPUs from several manufacturers, including AMD, NVIDIA, and Intel, in addition to the CPU.
An overview of how SYCL and oneAPI tools accelerated matrix solvers on cross-vendor GPUs and achieved code portability and scalability for deployment on various parallel supercomputers will be provided in this blog post. Let’s quickly review the MuST project and the oneAPI tools used in the experiment before getting into these specifics:
The Math Kernel Library (oneMKL) for Intel oneAPI
A high-performance library called oneMKL helps Intel architectures do arithmetic operations more quickly and efficiently. It is a comprehensive set of mathematical functions that includes vector math, linear algebra, Fast Fourier Transforms (FFTs), and many more. It makes it possible to use the OpenMP and SYCL frameworks to offload calculations to GPUs for parallel executions.
SYCL features for doing numerical calculations across a broad variety of fields are provided by the oneMKL Interfaces Project, an open-source implementation of the oneMKL standard. Relevant code examples are available at the oneAPI GitHub repository for each domain.
Overview of the Intel Fortran Compiler and the Intel oneAPI DPC++/C++ Compiler
The Intel Fortran Compiler, often known as ifx, is a single-API compiler that leverages LLVM technology to support the most recent programming language standards, efficient compilation, and fast Fortran code execution on CPU and GPU-powered Intel platforms. The world’s first completely SYCL 2020 compatible compiler, Intel oneAPI DPC++/C++ Compiler, is an LLVM-based, industry-standard cross-architecture compiler for C, C++, and SYCL code.
NVIDIA and AMD GPU Code Play OneAPI Plugins
For NVIDIA and AMD GPUs, Code Play offers oneAPI Plugins that enable all the advantages of heterogeneous computing to be applied to your hardware. NVIDIA and AMD GPUs now have support for the Intel oneAPI Base Toolkit thanks to the plugins. Moreover, you may create similar plugins for your own oneAPI implementation since they are completely open-sourced.Thus, by enabling the full capabilities of the SYCL programming framework on NVIDIA and AMD GPUs, the plugins allow multi-architecture, cross-vendor development.
Concerning The MuST Framework
An open-source computing framework called Multiple Scattering Theory was created to investigate quantum phenomena in materials with disorder. This software package is designed for electronic structure computations using the Korringa-Kohn-Rostoker (KKR) approach, sometimes referred to as the Green’s function method, which is a Multiple Scattering Theory. The framework makes it possible to examine random alloys and disorder effects in quantum materials from the ground up, or from the perspective of their electronic structure.
Expert researchers from the fields of condensed matter physics, applied mathematics, applied materials science, software engineering, and high-performance computing (HPC) collaborated on the MuST project, which is supported by the US National Science Foundation.
Difficulty: Inverse Matrix Inversion
The Multiple Scattering Theory framework’s initial Fortran code uses a block LU technique to accomplish matrix inversion and is CPU-specific. The matrix inversion phase takes up to 80%–90% of the application’s overall execution time due to its high computational cost. The task is to accelerate matrix inversion and achieve platform-independence for quicker outcomes on various architectures.
SYCL and oneAPI Accelerate Matrix Inversion is the suggested solution
The Multiple Scattering Theory framework’s matrix inversion step process is shown in Fig. 3. The block LU technique used in the original slow inversion approach, on the other hand, is limited to CPUs. Vendor-locked to NVIDIA GPUs, the resulting code is accelerated using NVIDIA cuSOLVER library function calls.
- In order to speed the matrix inversion code on multi-vendor GPUs and break away from vendor lock-in, the SC23 solution suggests making interlanguage calls from the Fortran-coded matrix inversion to the oneMKL SYCL API.
- The code that is produced in SYCL has the ability to use accelerated GPUs from many manufacturers, such as AMD, NVIDIA, and Intel.
- The suggested method makes use of the Intel oneAPI DPC++/C++ Compiler for assembling the oneMKL SYCL function calls on cross-vendor GPUs, as well as the Intel Fortran compiler for assembling the Fortran-coded features.
The suggested, more straightforward but quicker approach of matrix inversion outperforms the more sophisticated, original block LU method when calculations are done on a GPU. By using the suggested technique, the project’s physicist researchers will be able to concentrate more on the scientific aspects rather than creating intricate calculation algorithms.
What Comes Next?
Take use of Codeplay’s oneAPI plugins for NVIDIA and AMG GPUs, Intel oneAPI DPC++/C++ Compiler, and oneMKL to create an optimized, shared SYCL codebase that can provide excellent performance across a variety of heterogeneous, multi-vendor architectures. They invite you to investigate other AI, HPC, and rendering solutions available in Intel’s software portfolio that is driven by oneAPI.
Multiple Scattering Theory
Here at the MuST Project, welcome. For the study of quantum processes in disordered materials, MuST, a Multiple Scattering Theory-based first principle public computational framework, offers exascale computing capabilities.
MuST is derived from full-potential Multiple Scattering Theory using Green’s function methodology, which is often referred to as the Korringa-Kohn-Rostoker (KKR) method. It is based on decades of research code development by Malcolm Stocks and his postdocs and students in the Theory Group of Metals and Ceramics Division at Oak Ridge National Laboratory. This division subsequently became the Materials Science and Technology Division.
Korringa-Kohn-Rostoker Coherent Potential Approximation (KKR-CPA) is a highly efficient ab initio method for studying random alloys; Locally Self-consistent Multiple Scattering (LSMS) is a linear scaling ab initio code that can treat extremely large disordered systems from the ground up using the largest parallel supercomputers available. These are just two examples of the original research codes.
0 Comments