thesis research deals mostly with flexible and scalable motion representation and its use for scalable (wavelet-based) video coding. In a related earlier work, I was investigating a problem of motion analysis in 3D DCT domain. To learn more about my research, please visit my publications page or browse full-text articles here. Feel free to contact me with any question you might have.
N. Bozinovic, "Advanced motion modeling for 3D video coding." PhD thesis, Boston University, Apr. 2006.
Driven by new multimedia applications and the growing demand for more flexible and efficient transmission of video, a new approach to video coding has been recently proposed as an alternative to classical hybrid schemes. Instead of sequential frame-based predictive processing, the new approach is based on spatio-temporal 3D transforms, open-loop non-predictive processing, and embedded quantization and coding. This thesis investigates motion modeling for this new coding environment, as well as the impact of such modeling on both coder design and performance.
The first aspect of this thesis deals with video coding based on 3D discrete cosine transform (DCT). We analyze 3D DCT spectrum properties of a globally translating image and show how to use its characteristic footprint for fast and efficient video coding. Previous approaches to 3D DCT video coding have lead to rather modest compression gains due to a limited use of motion characteristics in the transform domain. We develop a coefficient scanning order that adapts to motion, unlike the fixed "zig-zag" scanning of JPEG. We combine this adaptive scanning with a new 3D quantization model to design a low-complexity 3D DCT video coder. The new coder consistently outperforms MPEG-2 both subjectively and objectively (by more than 1.5dB) at about 25% reduced complexity, while approaching the performance of MPEG-4 (within 0.8dB) at less than half computational complexity.
The second aspect of this thesis involves the role of motion in emerging video coders based on 3D discrete wavelet transform (DWT) and motion-compensated temporal filtering (MCTF). Motion invertibility, central to the optimality of lifted MCTF implementation, is first investigated. We introduce a metric for "invertibility error" between two motion fields. We develop advanced motion inversion methods and demonstrate their effectiveness in improving the update lifting step. Experimental results confirm that a better motion inversion, quantified by lower invertibility error, leads to an increase in coding gain up to 0.5dB over simpler inversion techniques. We propose a new method for "occlusion-aware" modeling and estimation of motion fields and use it to create an adaptive 3D DWT coding structure. Implicit modeling of occluded/uncovered areas, combined with the use of longer wavelet kernels, improves both the prediction and update lifting steps and results in the overall compression gain of up to 1dB over a non-adaptive coder.
The role of motion in 3D DWT coding motivates our exploration of advanced spatial motion models. We improve the performance of standard deformable triangular meshes through topology modification and an enhanced estimation algorithm. We also introduce a motion model based on hierarchical cubic splines and demonstrate its benefits in terms of motion-compensated prediction over the traditional block-constant model, especially at very low (176 x 144 and lower) spatial resolutions. We introduce and implement a new hierarchical "mixture motion model" for spatially-scalable motion representation that uses cubic spline-based motion at lower spatial resolutions and variable-size block matching at higher resolutions. Experimental results demonstrate up to 1dB performance gain of the mixture motion model over the all-block model at lower spatial resolutions without a negative impact on the full resolution performance. Overall, the best scalable results are obtained using either one or two spline-based motion layers at the lowest spatial resolutions of the mixture motion model.