memo: Transform coding (codec)

Transform coding is at the heart of the majority of video coding systems and standards.
Spatial image data (image samples or motion-compensated residual samples) are transformed into a different representation, the transform domain.

The two most widely used image compression transforms are the discrete cosine transform (DCT) and the discrete wavelet transform (DWT). The DCT is usually applied to small, regular blocks of image samples (e.g. 8 x 8 squares) and the DWT is usually applied to larger image sections ("tiles") or to complete images

DCT

The DCT has proved particularly durable and is at the core of most of the current generation of image and video coding standards, including JPEG, H.261, H.263, H.263+, MPEG-l, MPEG-2 and MPEG-4. The DWT is gaining popularity because it can outperform the DCT for still image coding and so it is used in the new JPEG image coding standard (JPEG-2000) and for still "texture" coding in MPEG-4.

DCT become the most popular transform for image and video coding. There are two main reasons for its popularity: first, it is effective at transforming image data into a form that is easy to compress and second, it can be efficiently implemented in software and hardware.

The forward DCT (FDCT) of an N × N sample block isgiven by:
     Y = AXA(T)
and the inverse DCT (IDCT) by:
     X = A(T)YA

The transform matrix A for a 4 × 4 DCT is:
A =

0.5         0.5 0.5 0.5
0.653     0.271       0.271 −0.653
0.5 −0.5 −0.5         0.5
0.271 −0.653 −0.653     0.271

The forward DCT (FDCT) transforms a set of image samples (the "spatial domain") into a set of transform coefficients (the "transform domain"). The transform is reversible: the inverse DCT (IDCT) transforms a set of coefficients into a set of image samples.

The DCT has two useful properties for image and video compression, energy compaction (concentrating the image energy into a small number of coefficients) and decorrelution (minimising the interdependencies between coefficients).

A reasonable approximation to the original image block can be reconstructed from just these most significant coefficients.

The DCT becomes increasingly complex to calculate for larger block sizes.

DWT

a wavelet transform is typically applied to a complete image or a large rectangular region ("tile") of the imag.

A single-stage wavelet transformation consists of a filtering operation that decomposes an image into four frequency bands. m. The top-left comer of the transformed image ("LC) is the original image, low-pass filtered and subsampled in the horizontal and vertical dimensions. The top-right comer ("W) consists of residual vertical frequencies .
The bottom-left comer "LH" contains residual horizontal frequencies.
The bottom-right comer "HH" contains residual diagonal frequencies.

This decomposition process may be repeated for the "LL" component to produce another set of four components: a new "LL" component that is a further subsampled version of the original image, plus three more residual frequency component.

The wavelet decomposition has some important properties. First, the number of wavelet "coefficients" (the spatial values that make up Figure 7.8) is the same as the number of pixels in the original image and so the transform is not inherently adding or removing information.
Second, many of the coefficients of the high-frequency components ("HH", "HL" and "LH" at each stage) are zero or insignificant. This reflects the fact that much of the important information in an image is low-frequency. Third, the decomposition is not restricted by block boundaries (unlike the DCT) and hence may be a more flexible way of decorrelating the image data (i.e. concentrating the significant components into a few coefficients) than the block-based DCT.

Wavelet-based compression performs well for still images (particularly in comparison with DCT-based compression) and can be implemented reasonably efficiently.

memo

Wednesday, July 7, 2010

Transform coding (codec)

No comments: