SKEDSOFT

Real Time Systems

Only frames 1 αk, for k = 0, 1, 2, . . . are encoded independently of other frames, where α is an application-specified integer constant. These frames are called I-frames (i.e., intra-coded frames). The coder treats each I-frame as a still image, and the decoder can decompress each compressed I-frame independently of other frames. Consequently, I-frames are points for random access of the video. The smaller the constant α, the more random accessible is the video and the poorer the compression ratio. A good compromise is α = 9. The frames between consecutive I-frames are called P- and B-frames. When α is equal to 9, the sequence of frames produced by the motion estimation step are I, B, B, P, B, B, P, B, B, I, B, B, P, . . . . For every k ≥ 0, frame 1 9k 3 is a P-frame (i.e., a predictive-coded frame). The coder obtains a P-frame from the previous I-frame (or P-frame) by predicting how the image carried by the I-frame changes in the time interval between the times of these frames. Specifically, if amajor block in the P-frame closely resembles a major block in the previous I-frame, then the coder represents the P-frame major block by six 8×8 pixel blocks that give the differences between the six P-frame pixel blocks and the corresponding pixel blocks of the best matching (i.e., resembling most closely) major block in the previous I-frame. In addition, the coder generates a motion vector based on which the decoder can identify the best matching I-frame major block. Such a P-frame major block is said to be predictively coded. On the other hand, some P-frame major blocks may be images of newly visible objects and, hence, cannot be obtained from any major block in the previous I-frame. The coder represents them in the same way as I-frame major blocks.

A B-frame is a bidirectionally predicted frame: It is predicted from both the previous I-frame (or P-frame) and the subsequent P-frame (or I-frame). One way is to represent every B-frame major block by the differences between the values of its pixel blocks and the corresponding pixel blocks of the best matching major block in either the previous I-frame or the subsequent P-frame. Alternatively, for each B-frame major block, an interpolation of the best matching major blocks in the I-frame and P-frame is first computed. The B-frame major block is represented by the difference between it and this interpolation. Again, the coder generates the motion vectors that the decoder will need to identify the best matching I-frame and P-frame major blocks. Whereas some P-frame major blocks are encoded independently, none of the B-frame major blocks are.

Discrete Cosine Transform and Encoding. In the second step, a cosine transform13 is performed on each of the 8×8 pixel blocks produced by the coder after motion estimation. We let x(i, j ), for i, j = 1, 2, . . . , 8, denote the elements of an 8×8 transform matrix obtained from transforming the originalmatrix that gives the 8×8 values of a pixel block. The transformmatrix usually has more zeros than the original matrix. [In the extreme when all the entries of the original matrix are equal, only x(0, 0) is nonzero.] Moreover, if the entries x(i, j )’s are ordered in nondecreasing order of i j , zero entries tend to be adjacent, forming sequences of zeros, and adjacent entries tend to have similar values. By quantizing the x(i, j )’s to create more zeros, encoding the entries in the transform matrix as 2-tuples (run length, value), and using a combination of variable-length and fixed-length codes to further reduce the bit rate, significant compression is achieved.

Decompression. During decompression, the decoder first produces a close approximation of the original matrix (i.e., an 8 × 8 pixel block) by performing an inverse transform on each stored transform matrix. (The computation of an inverse transform is the essentially the same as the cosine transform.) It then reconstruct the images in all the frames from the major blocks in I-frames and difference blocks in P- and B-frames.

Real-Time Characteristics. As we can see from the above description, video compression is a computational-intensive process. For batch applications such as video on demand, compression is done in batch and off-line, while it must be an on-line process for interactive applications (e.g., teleconferencing). Decompression should be done just before the time the video and audio are presented, in other words, on the just-in-time basis. Today, compression and decompression functions are often handled by an affordable special-purpose processor (e.g., the mmx), rather than by general-purpose processors.