Where can I learn a JPEG algorithm

Prof. Uwe Schulz

Lecture Computer Science 1 - Part B: Theory

2.7 Discrete Cosine Transformation

The DCT is a method that is used both in the image and, in a modified form, in the audio sector for the compression of data.

It is a special form of the Fourier transformation (FT). Jean Fourier recognized in 1822 that any periodic signal can be represented as the sum of sine and cosine functions. The coefficients of these functions form the Fourier spectrum of a function.

The Fourier transformation converts a function from time to frequency space and vice versa. The conversion into the frequency domain is called Fourier analysis, that into the time domain is called Fourier synthesis. This can best be imagined with audio signals, an illustrative example is https://www.falstad.com/fourier/. You will get to know the Fourier transformation in the second semester of mathematics.

The discrete cosine transformation is a special form (integer values, only cosine terms).

During image processing, the DCT is applied to a pixel grid, e.g. a 4 * 4 pixel section of an image (gray or color values):

Pixel matrix (image section)

If we transform this grid using the DCT, a 4 * 4 matrix is ​​created again, but it no longer contains whole numbers:

DCT Transformed

This algorithm does not reduce the amount of data, it just converts it. This is a transformation into a frequency space, but in Spatial frequencies:
  • low spatial frequencies correspond to large structures, i.e. small differences between neighboring pixels,
  • high spatial frequencies occur with fine structures, e.g. sharp edges with a transition between light and dark.

In the DCT matrix, values ​​in the upper left area of ​​the matrix correspond to coarse structures (low spatial frequencies), values ​​in the lower right area correspond to fine structures (high spatial frequencies).

the image section

leads to the following DCT matrix:

DCT transform in which only the constant component exists.

The DCT is an important intermediate stage in many compression methods, because you can usually omit the lower right area of ​​the matrix, which corresponds to fine structures, or only roughly quantize it without the eye noticing it.

If we denote the values ​​of an n * n image raster with f (x, y) and the coefficients of the DCT matrix with F (u, v), the following formula applies to the calculation of a DCT value:

F (u, v) = 2 / n k (u) k (v) Σ Σ f (x, y) cos (u π (2x + 1) / 2n) cos (v π (2y + 1) / 2n)

Where ku and kv are correction values ​​for the edge:

k = 1sqrt (2) for u, v = 0 andk = 1 for u, v! = 0 \)

There is a similar formula for the inverse transformation

Except for numerical inaccuracies, the DCT is reversible, lossless and symmetrical. The computational effort increases quadratically with the number of pixels, which is why only blocks of size 8 * 8 are processed in the JPEG process.

The DCT takes place e.g. at the beginning of the JPEG algorithm, then the values ​​of the DCT matrix are read out in a meandering shape along the main diagonal so that they are sorted by increasing spatial frequency. The last values ​​that belong to high spatial frequencies can either be omitted or quantized with a few bits.

Further links

Instructional video (YouTube)