Istanbul Technical University
Video understanding is one of the grand challenges of computer vision, which includes various tasks such as, action recognition, video classification, object detection and activity localization. Deep learning based methods are successful in image understanding. However, they do not perform well in video understanding tasks. Lower performance suggests that deep learning approaches are not able to extract discriminative and meaningful features from highly redundant video signals efficiently. Deep learning models analyzing still image frames only cannot discover temporal relationships between successive frames in video. To overcome this deficiency, state-of-the-art video understanding techniques adopt integrated deep learning architectures representing optical flow vectors, in expense of heavier computational load.
On the other hand, in practice, video signal is compressed by some kind of a video codec, immediately after the signal is acquired. The goal of the conventional compression schemes is transmitting and storing the high-volume video data efficiently. The compression process with this goal includes approaches which remove temporal and spatial redundancy. Compressing video signal results in vectors with lower dimensionality that effectively represents the original signal. The objective of this project is to increase video understanding performance by developing deep learning models, and train them with the compressed domain attributes that already represent underlying distributions efficiently.
This project aims at achieving higher levels of performance scores in video understanding by analyzing video directly in the compressed domain. Instead of fully-reconstructing an already compressed video signal and analyzing it afterwards, algorithms will be developed to exploit available compressed domain features for video understanding.
To that end, model compression and continual learning techniques will also be explored to improve generalization capability of the video understanding models while keeping the computational complexity low.
As a case study, compressed domain video analysis algorithms will be applied to video fire detection problem. Apart from benchmarking video understanding algorithms with publicly available datasets on activity recognition and video object detection, it is anticipated that compressed domain video understanding techniques may propose a solution for this important issue in line with the firefighting efforts under "Green Deal".
List of research outcomes on compressed domain video analytics by our group's earlier efforts: