Image and Video Analysis

Spatiotemporal Visual Saliency

Although human vision appears to be easy and unconscious there exist complex neural mechanisms in primary visual cortex that form the preattentive component of the Human Visual System (HVS) and lead to visual awareness. Considerable research has been carried out into the attention mechanisms of the HVS and computational models have been developed and employed to common computer vision problems. Most of the models simulate the bottom-up mechanism of the HVS and their major goal is to filter out redundant visual information and detect/enhance the most salient parts of the input. The Human Visual System has the ability to fixate quickly on the most informative (salient) regions of a scene and reduce therefore the inherent visual uncertainty. Computational visual attention (VA) schemes have been proposed to account for this important characteristic of the HVS. We study and expand the field of computational visual attention methods, propose novel models both for spatial (images) and spatiotemporal (video sequences) analysis and evaluate both qualitatively and quantitavely in a variety of relevant applications.

Spatiotemporal saliency for video classification

Many computer vision applications often need to process only a representative part of the visual input rather than the whole image/sequence. Considerable research has been carried out into salient region selection methods based either on models emulating human attention mechanisms or on more computational plausible solutions of vision. Most of the proposed methods are bottom-up and their major goal is to filter out redundant visual information and detect/enhance the most salient one. In this paper, we propose and elaborate on a saliency detection model that treats a video sequence as a spatiotemporal volume and generates a local saliency measure for each visual unit (voxel). This computation involves an optimization procedure incorporating inter- and intra- feature competition at the voxel level. Perceptual decomposition of the input, spatiotemporal center-surround interactions and the integration of heterogeneous feature conspicuity values are described and an experimental framework for video classification is set up. This framework consists of a series of experiments that shows the effect of saliency in classification performance and how well the salient regions represent the visual input. A comparison of the proposed method against other is attempted that shows the potential of the proposed method.

Publications

Conferences

K. Rapantzikos, G. Evangelopoulos, P. Maragos, Y. Avrithis. An Audio-Visual Saliency Model for Movie Summarization. In Proceedings of IEEE Int'l Workshop on Multimedia Signal Processing (MMSP 2007), October 2007.
[ Abstract ]
[ Bibtex ] [ PDF ]
K. Rapantzikos, Y.Avrithis, S. Kollias. Spatiotemporal saliency for event detection and representation in the 3D Wavelet Domain: Potential in human action recognition. In Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2007), Amsterdam, The Netherlands, July 2007.
[ Abstract ]
[ Bibtex ] [ PDF ]

Book Chapters

G. Evangelopoulos, K. Rapantzikos, P. Maragos, Y. Avrithis, A. Potamianos. Audiovisual Attention Modeling and Salient Event Detection. In Multimodal Processing and Interaction: Audio, Video, Text, P. Maragos, A. Potamianos, P. Gros (Eds.), Springer-Verlag, pp. 179-199, 2008.
[ Abstract ]
[ Bibtex ] [ PDF ]

Conferences

G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, Y. Avrithis. Movie Summarization Based On Audio-Visual Saliency Detection. In Proceedings of 15th International Conference on Image Processing (ICIP 2008), San Diego, California, USA, October 2008.
[ Abstract ]
[ Bibtex ] [ PDF ]
K. Rapantzikos, Y. Avrithis. An enhanced spatiotemporal visual attention model for sports video analysis. In Proceedings of International Workshop on content-based Multimedia indexing (CBMI 2005), June 2005.
[ Abstract ]
[ Bibtex ] [ PDF ]
K. Rapantzikos, Y. Avrithis, S. Kollias. On the use of spatiotemporal visual attention for video classification. In Proceedings of Int. Workshop on Very Low Bitrate Video Coding (VLBV 2005), Sardinia, Italy, September 2005.
[ Abstract ]
[ Bibtex ] [ PDF ]
K. Rapantzikos, Y. Avrithis, S. Kollias. Dense saliency-based spatiotemporal feature points for action recognition. In Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, USA, June 2009.
[ Abstract ]
[ Bibtex ] [ PDF ]

Journals

K. Rapantzikos, N. Tsapatsoulis, Y. Avrithis, S. Kollias. Spatiotemporal Saliency for Video Classification. In Signal Processing: Image Communication, vol 24, no. 7, pp. 557-571, August 2009.
[ Abstract ]
[ Bibtex ] [ PDF ]

Conferences

G. Evangelopoulos, A. Zlatintsi, G. Skoumas, K. Rapantzikos, A. Potamianos, P. Maragos, Y. Avrithis. Video event detection and summarization using audio, visual and text saliency. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009.
[ Abstract ]
[ Bibtex ] [ PDF ]

Book Chapters

K. Rapantzikos, Y. Avrithis, S. Kolias. Vision, Attention Control, and Goals Creation System. In Perception-Action Cycle, V. Cutsuridis, Amir Hussain, John G. Taylor (Eds.), Springer, pp. 363-386, 2011.
[ Abstract ]
[ Bibtex ] [ PDF ]

Journals

K. Rapantzikos, Y. Avrithis, S. Kollias. Spatiotemporal features for action recognition and salient event detection. In Cognitive Computation, special issue on Saliency, attention, visual search and picture scanning, vol 3, no. 1, pp. 167-184, 2011.
[ Abstract ]
[ Bibtex ] [ PDF ]
G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, Y. Avrithis. Multimodal Saliency and Fusion for Movie Summarization based on Aural, Visual, and Textual Attention. In IEEE Transactions on Multimedia, vol 15, no. 7, pp. 1553-1568 , November 2013.
[ Abstract ]
[ Bibtex ] [ PDF ]