324 KB
222 KB

The Maths Behind Image Compression

You've probably seen an image that looks like this before

A heavily compressed image, from Matlab help centre

If you look closely, you can see the image is broken up into multiple blocks, which seem to have patterns inside them

The same image, zoomed in, showing the image broken up into blocks

This is the result of JPEG image compression (Computerphile, 2015). JPEG is a lossy format, meaning it can reduce file size by discarding some visual information that is less noticeable to the human eye. But why does it make the image look like this?

And more importantly, how does it relate to trigonometry?

Why we need trigonometry to compress an image

Fundamentally, lossy JPEG image compression is quite a simple concept. It reduces image detail that is less important to how we see the image. But to do that, the image first needs to be converted into a format where that detail is easier reduce.

That's why the Discrete Cosine Transform (DCT) is used to compress images. The DCT is a way of expressing data as a sum of cosine functions. It's similar to the Fourier Transform , but only uses cosine functions (Wikipedia Contributors, 2020).

While not compressing the image itself, the DCT makes it a lot easier to compress images, by rearranging image data, so that most of the visually important information is represented in a small number of low-frequency coefficients. JPEG can then store the image more efficiently by removing/reducing the higher-frequency coefficients.

A way to think about this is that an image isn't only pixels, but also patterns. Some patterns change slowly, like smooth gradients, while others change quickly, like sharp edges, fine texture, or noise.

Trig functions are useful because they let us describe these patterns by frequency. A low frequency cosine wave changes slowly, while a high frequency cosine wave changes quickly. By combining multiple cosine waves, we can represent complex data, including brightness values in an image.

y=cos(kx)y = \cos(kx)

In this simple model, k controls the frequency. When k is small, the graph changes slowly. When k is larger, the graph oscillates more often over the same interval.

The graph below shows this idea in one dimension. As the frequency changes, the wave oscillates more or less rapidly. JPEG applies the same idea to small two-dimensional blocks of image data.

Controls

This graph supports the compression model because it shows how frequency changes the shape of a cosine wave. In an image, low-frequency cosine patterns represent smooth changes in brightness, while high-frequency cosine patterns represent sudden changes such as edges, fine texture, and noise.

So we've discussed the idea. Now time to apply it.

Compressing an Image

Let's take an example. I'll be using a simple black and white image for now:

If it was a colour image, JPEG would usually convert it into separate luminance and two chrominance channels (Wikipedia Contributors, 2019b). The DCT step is then applied to blocks from each channel, but splitting an image into channels is out of scope for this page.

Preparation

To compress this image, it is first split into 8x8 blocks (Wikipedia Contributors, 2019b). This makes it easier to work with, and can allow the image to be compressed more efficiently. A compressor will then loop over each block compressing it separately.

Let's take this part of the image and work with that

Firstly, we can take the brightness of each pixel from 0 - 255 and represent it as a grid:

159
160
160
161
161
159
158
158
153
159
162
163
162
160
159
156
125
143
154
159
159
159
157
155
81
107
129
143
149
154
155
154
56
69
87
111
130
139
145
141
45
55
63
75
97
110
113
114
43
50
58
66
76
87
94
100
44
52
58
65
73
82
89
95

Then we need to shift the values to be around 0. JPEG does this by subtracting 128 from each pixel:

31
32
32
33
33
31
30
30
25
31
34
35
34
32
31
28
-3
15
26
31
31
31
29
27
-47
-21
1
15
21
26
27
26
-72
-59
-41
-17
2
11
17
13
-83
-73
-65
-53
-31
-18
-15
-14
-85
-78
-70
-62
-52
-41
-34
-28
-84
-76
-70
-63
-55
-46
-39
-33

Now we need to express this block as a weighted sum of cosine patterns. This is where trigonometry comes in.

This is a plot of 64 cosine patterns, one for each value in an 8x8 block:

These patterns are usually called the DCT basis functions. Each square above is a graphical model of one two-dimensional cosine pattern. The grid shows why the DCT is connected to trigonometry. The image block is rebuilt by adding these cosine patterns together with different weights. The top-left pattern is constant, while patterns further right and further down oscillate more quickly.

But how is this useful? With these 64 basis functions, we can represent any 8x8 block of pixel values. The DCT tells us how much of each basis function is needed.

After the DCT is applied, we get another 8x8 grid. But these numbers aren't pixel brightness, they are coefficients, telling us how strongly each function appears in the block.

-85
-124
-32
-7
-7
-4
-2
-1
287
64
-9
-4
0
1
-1
0
-21
57
27
2
4
0
2
0
-31
-17
8
13
3
0
0
2
7
-17
-7
-2
2
2
-1
0
-4
3
-1
-7
-2
-1
1
-1
-2
2
1
2
-1
-3
0
0
2
-2
0
1
2
2
-1
0

The top left coefficient, called the DC coefficient is responsible for storing the average brightness of the 8x8 block. The remaining coefficients are called AC coefficients and describe changes in brightness across the block.

You can see in this example, the second coefficient from the top (the one at 287) is pretty high, indicating that the function appears strongly in the block.

In most images, nearby pixels are similar. This means a lot of the important information in the image is stored in the lower frequency coefficients (the ones up the top left of the DCT grid), while the higher frequency coefficients are mostly unimportant details.

These details are removed using a quantisation table, which is an 8x8 table specifying the values to divide each coefficient by before storing it. The values are then rounded to the nearest integer. This is the main lossy step in JPEG's DCT-based process, because rounding changes the coefficients and loses some exact detail.

Large divisors make coefficients more likely to round to zero, which is where file size drops and detail is reduced. This is because zeros are a lot easier to compress than non-zero values.

Quantisation table

16
11
10
16
24
40
51
61
12
12
14
19
26
58
60
55
14
13
16
24
40
57
69
56
14
17
22
29
51
87
80
62
18
22
37
56
68
109
103
77
24
35
55
64
81
104
113
92
49
64
78
87
103
121
120
101
72
92
95
98
112
100
103
99

DCT coefficients after quantisation

-5
-11
-3
0
0
0
0
0
24
5
-1
0
0
0
0
0
-1
4
2
0
0
0
0
0
-2
-1
0
0
0
0
0
0
0
-1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Then the process is repeated on each 8x8 block of the image.

Compression Demo

Original zoom
Compressed zoom
Original
Compressed

Why I Chose This Application

I chose image compression because it is a practical use of trigonometry that appears constantly in everyday technology. I'm also interested in tech, and wanted to do something tech related. I found the method JPEG uses to compress images to be fascinating, and wanted to explore it further.

Conclusion

I think image compression is an interesting use of trigonometry, and I had fun researching and trying to understand how two seemingly unrelated things are actually related.

I hope you learned something :)

References

Computerphile. (2015). JPEG DCT, Discrete Cosine Transform (JPEG Pt2)- Computerphile [YouTube Video]. In YouTube. https://www.youtube.com/watch?v=Q2aEzeMDHMA

Discrete Cosine Transform - MATLAB & Simulink - MathWorks Australia. (n.d.). Au.mathworks.com. https://au.mathworks.com/help/images/discrete-cosine-transform.html

Swanson, J. (n.d.). An Interactive Introduction to Fourier Transforms. Www.jezzamon.com. https://www.jezzamon.com/fourier/

Wikipedia Contributors. (2019a, June 26). Data compression. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/Data_compression

Wikipedia Contributors. (2019b, October 8). JPEG. Wikipedia; Wikimedia Foundation. https://en.wikipedia.org/wiki/JPEG

Wikipedia Contributors. (2020, August 10). Discrete cosine transform. Wikipedia. https://en.wikipedia.org/wiki/Discrete_cosine_transform