The Maths Behind Image Compression

You've probably seen an image that looks like this before

A heavily compressed image, from Matlab help centre

If you look closely, you can see the image is broken up into multiple blocks, which seem to have patterns inside them

The same image, zoomed in, showing the image broken up into blocks

This is the result of JPEG image compression (Computerphile, 2015). JPEG is a lossy format, meaning it can reduce file size by discarding some visual information that is less noticeable to the human eye. But why does it make the image look like this?

And more importantly, how does it relate to trigonometry?

Why we need trigonometry to compress an image

Fundamentally, lossy JPEG image compression is quite a simple concept. It reduces image detail that is less important to how we see the image. But to do that, the image first needs to be converted into a format where that detail is easier reduce.

That's why the Discrete Cosine Transform (DCT) is used to compress images. The DCT is a way of expressing data as a sum of cosine functions. It's similar to the Fourier Transform , but only uses cosine functions (Wikipedia Contributors, 2020).

While not compressing the image itself, the DCT makes it a lot easier to compress images, by rearranging image data, so that most of the visually important information is represented in a small number of low-frequency coefficients. JPEG can then store the image more efficiently by removing/reducing the higher-frequency coefficients.

A way to think about this is that an image isn't only pixels, but also patterns. Some patterns change slowly, like smooth gradients, while others change quickly, like sharp edges, fine texture, or noise.

Trig functions are useful because they let us describe these patterns by frequency. A low frequency cosine wave changes slowly, while a high frequency cosine wave changes quickly. By combining multiple cosine waves, we can represent complex data, including brightness values in an image.

y = \cos(kx)

In this simple model, k controls the frequency. When k is small, the graph changes slowly. When k is larger, the graph oscillates more often over the same interval.

The graph below shows this idea in one dimension. As the frequency changes, the wave oscillates more or less rapidly. JPEG applies the same idea to small two-dimensional blocks of image data.

Controls

Frequency 0.5

This graph supports the compression model because it shows how frequency changes the shape of a cosine wave. In an image, low-frequency cosine patterns represent smooth changes in brightness, while high-frequency cosine patterns represent sudden changes such as edges, fine texture, and noise.

So we've discussed the idea. Now time to apply it.

Compressing an Image

Let's take an example. I'll be using a simple black and white image for now:

If it was a colour image, JPEG would usually convert it into separate luminance and two chrominance channels (Wikipedia Contributors, 2019b). The DCT step is then applied to blocks from each channel, but splitting an image into channels is out of scope for this page.

Preparation

To compress this image, it is first split into 8x8 blocks (Wikipedia Contributors, 2019b). This makes it easier to work with, and can allow the image to be compressed more efficiently. A compressor will then loop over each block compressing it separately.

Let's take this part of the image and work with that

Firstly, we can take the brightness of each pixel from 0 - 255 and represent it as a grid:

159

160

161

159

158

153

159

162

163

162

160

159

156

125

143

154

159

157

155

107

129

143

149

154

155

154

111

130

139

145

141

110

113

114

100

Then we need to shift the values to be around 0. JPEG does this by subtracting 128 from each pixel:

-3

-47

-21

-72

-59

-41

-17

-83

-73

-65

-53

-31

-18

-15

-14

-85

-78

-70

-62

-52

-41

-34

-28

-84

-76

-70

-63

-55

-46

-39

-33

Now we need to express this block as a weighted sum of cosine patterns. This is where trigonometry comes in.

This is a plot of 64 cosine patterns, one for each value in an 8x8 block:

These patterns are usually called the DCT basis functions. Each square above is a graphical model of one two-dimensional cosine pattern. The grid shows why the DCT is connected to trigonometry. The image block is rebuilt by adding these cosine patterns together with different weights. The top-left pattern is constant, while patterns further right and further down oscillate more quickly.

The top-left square is lowest frequency
Moving right increases horizontal frequency
Moving down increases vertical frequency
The bottom-right square is highest frequency in both directions

But how is this useful? With these 64 basis functions, we can represent any 8x8 block of pixel values. The DCT tells us how much of each basis function is needed.

After the DCT is applied, we get another 8x8 grid. But these numbers aren't pixel brightness, they are coefficients, telling us how strongly each function appears in the block.

-85

-124

-32

-7

-4

-2

-1

287

-9

-4

-1

-21

-31

-17

-7

-2

-1

-4

-1

-7

-2

-1

-2

-1

-3

-2

-1

The top left coefficient, called the DC coefficient is responsible for storing the average brightness of the 8x8 block. The remaining coefficients are called AC coefficients and describe changes in brightness across the block.

You can see in this example, the second coefficient from the top (the one at 287) is pretty high, indicating that the function appears strongly in the block.

In most images, nearby pixels are similar. This means a lot of the important information in the image is stored in the lower frequency coefficients (the ones up the top left of the DCT grid), while the higher frequency coefficients are mostly unimportant details.

These details are removed using a quantisation table, which is an 8x8 table specifying the values to divide each coefficient by before storing it. The values are then rounded to the nearest integer. This is the main lossy step in JPEG's DCT-based process, because rounding changes the coefficients and loses some exact detail.

Large divisors make coefficients more likely to round to zero, which is where file size drops and detail is reduced. This is because zeros are a lot easier to compress than non-zero values.

Image quality50%

Quantisation table

109

103

104

113

103

121

120

101

112

100

103

DCT coefficients after quantisation

-5

-11

-3

-1

-2

-1

Then the process is repeated on each 8x8 block of the image.

Compression Demo

Image quality15%

Original zoom

Compressed zoom

Original

Compressed

Why I Chose This Application

I chose image compression because it is a practical use of trigonometry that appears constantly in everyday technology. I'm also interested in tech, and wanted to do something tech related. I found the method JPEG uses to compress images to be fascinating, and wanted to explore it further.

Conclusion

I think image compression is an interesting use of trigonometry, and I had fun researching and trying to understand how two seemingly unrelated things are actually related.

I hope you learned something :)

References

Computerphile. (2015). JPEG DCT, Discrete Cosine Transform (JPEG Pt2)- Computerphile [YouTube Video]. In YouTube. https://www.youtube.com/watch?v=Q2aEzeMDHMA

Discrete Cosine Transform - MATLAB & Simulink - MathWorks Australia. (n.d.). Au.mathworks.com. https://au.mathworks.com/help/images/discrete-cosine-transform.html

Swanson, J. (n.d.). An Interactive Introduction to Fourier Transforms. Www.jezzamon.com. https://www.jezzamon.com/fourier/

Wikipedia Contributors. (2019a, June 26). Data compression. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/Data_compression

Wikipedia Contributors. (2019b, October 8). JPEG. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/JPEG

Wikipedia Contributors. (2020, August 10). Discrete cosine transform. Wikipedia. https://en.wikipedia.org/wiki/Discrete_cosine_transform