Intel oneAPI DPC++/C++ Compiler JPEG Compression

 

Discrete Cosine Transform DCT

Discrete Cosine Transform (DCT) with SYCL for GPU-Based JPEG Image Compression.
Use the  Intel oneAPI DPC++/C++ Compiler to accelerate concurrent picture compression.

Image compression reduces digital image files without compromising quality. Eliminating superfluous and duplicated data simplifies image storage and transmission over the internet or other networks.

The oneAPI GitHub repository has a code sample for the Discrete Cosine Transform, which is discussed in this blog. It shows how to use SYCL and the Intel oneAPI DPC++/C++ Compiler to create the Discrete Cosine Transform (DCT), an irreversible picture compression method for JPEG images.

Discrete cosine transform for image compression

Let’s expand on their discussion of picture compression before getting into the specifics of the code sample.

Applications of image compression in the real world include:

  • Digital photography to share and store high-resolution photos taken using cameras in an effective manner
  • Consumer electronics to reduce data usage and storage capacity on mobile devices, such as tablets and smartphones.
  • Medical imaging to transfer and store medical images efficiently while maintaining image quality for accurate diagnosis.
  • Video surveillance to effectively store and transfer photos taken by surveillance systems by compressing them using cloud services.
  • Web development to enhance user experience and save bandwidth consumption by enabling quicker image loading times on websites.

Discrete Cosine Transform Example

Two categories of image compression methods exist

Lossless compression: This method ensures image quality and accurate image reconstruction from compressed data. PNG, GIF, and TIFF are prominent lossless image formats.

Lossy compression: This method permanently destroys image data, making image reconstruction impossible. JPEG and WebP are popular lossy compression formats.

Lossy compression algorithms often translate the image into a frequency domain before quantizing the frequency components using mathematical approaches like the DCT.

Advantages of Discrete Cosine Transform

Because it tends to concentrate the image signal information in a few low-frequency components, the DCT image compression approach is advantageous. This facilitates the attainment of large compression ratios without sacrificing visual quality.

The loss of image quality resulting from the DCT compression process can be rendered undetectable to the human eye while achieving a large reduction in file size through the careful application of quantization.

Now let’s talk about the Discrete Cosine Transform code sample and how SYCL-based GPU offload can be used to accelerate compression utilising the Intel oneAPI DPC++/C++ Compiler.

Overview of the Intel oneAPI DPC++/C++ Compiler

A high-performance, LLVM-based compiler that complies with industry standards, the  Intel oneAPI DPC++/C++ Compiler aids in the compilation of ISO C/C++ and SYCL applications on a variety of architectures. It is the first compiler in the world to support the most recent version of the SYCL 2020 specification. It supports OpenMP and OpenCL in addition to SYCL and other accelerated parallel computing frameworks.

It is intended to work in harmony with oneAPI libraries, such oneDPL and oneTBB, and to take advantage of them for offloading computation acceleration and optimized parallel execution. Code reusability across heterogeneous hardware platforms, such as CPUs, GPUs, and FPGAs, is made possible by these design qualities.

Discrete Cosine Transform Applications

Concerning The Sample Discrete Cosine Transform Code

The input image is first quantized and the Discrete Cosine Transform (DCT) is applied by the code sample. The resulting intermediate image is then subjected to inverse DCT and de-quantization to yield an output BMP image. This image will be utilized to evaluate the amount of image information lost as a result of the DCT compression method.

DCT Phase

Each pixel’s colour value is stored in the image’s pixel representation. A sum of several cosine functions is used to depict the colour pattern of image subsets. Eight by eight subsections, or “blocks” in the code example, are used to process the image. Only 8 discrete cosine functions can be used to depict an 8×8 image. All that is needed to reconstruct the image from the cosine representation are the coefficients connected to each cosine function. The DCT procedure converts the input image’s 8×8 pixel matrix into an equivalent 8×8 matrix of coefficients.

Step of Quantization

The image data can be compressed thanks to the quantization procedure. The cosine functions that are most pertinent to picture data are ranked in order using a quantizing matrix. If read diagonally (as recorded in the memory), the matrix acquired after DCT is divided by the quantizing matrix, yielding a sequence of integers followed by multiple zeroes. The original image can be compressed because of the long string of zeroes.

Steps for De-quantization and Inverse DCT

The code sample then re-produces the raw image data by performing inverse DCT and de-quantization before writing the quantization output to a file. The final image will not be a reduced version of the original because to the inverse processes. It will, however, reveal the artefacts brought about by an irreversible compression technique such as DCT.

SYCL

SYCL-Based Parallel Computations

An image’s individual 8×8 blocks can be handled concurrently or individually. With a few little tweaks to the original serial approach, the code sample easily achieves SYCL parallelization.

Example of a Product

The code sample was run on a 6th generation Intel Core processor equipped with an integrated Intel Processor Graphics Gen 9 or later and an  Intel oneAPI DPC++/C++ compiler. The example output is shown below. If a compatible GPU is detected, the code will direct execution to it; otherwise, it executes on the CPU (host).

Filename: willyriver.bmp W: 5184 H: 3456 
  

Start image processing with offloading to GPU... 

Running on Intel(R) UHD Graphics 620 

--The processing time is 6.27823 seconds 
  

DCT successfully completed on the device. 

The processed image has been written to willyriver_processed.bmp

What Comes Next?

See the Discrete Cosine Transform sample for an implementation of the SYCL-based parallel DCT picture compression technology.

Take a look at a few more code samples that are accessible in the oneAPI GitHub repository.

Use the  Intel oneAPI DPC++/C++ Compiler now to begin compiling C/C++ and SYCL apps across a variety of heterogeneous systems with efficiency.

Examine further  AI, HPC, and rendering solutions available in Intel’s software portfolio that is powered by oneAPI.

Obtain the Programme

Install the  Intel HPC Toolkit or Intel oneAPI Base Toolkit along with the Intel oneAPI DPC++/C++ Compiler. Additionally, you may use the Intel Tiber Developer Cloud platform to test the compiler on a variety of Intel CPUs and GPUs or download a standalone version.

Post a Comment

0 Comments