Computer vision applications often need to process only a representative part of the visual input rather than the whole image/sequence. Considerable research has been carried out into salient region detection methods based either on models emulating human visual attention (VA) mechanisms or on computational approximations. Most of the proposed methods are bottom-up and their major goal is to filter out redundant visual information. In this paper, we propose and elaborate on a saliency detection model that treats a video sequence as a spatiotemporal volume and generates a local saliency measure for each visual unit (voxel). This computation involves an optimization process incorporating inter- and intra-feature competition at the voxel level. Perceptual decomposition of the input, spatiotemporal center-surround interactions and the integration of heterogeneous feature conspicuity values are described and an experi- mental framework for video classification is set up. This framework consists of a series of experiments that shows the effect of saliency in classification performance and let us draw conclusions on how well the detected salient regions represent the visual input. A comparison is attempted that shows the potential of the proposed method.
Signal Processing: Image Communication, Volume 24, Issue 7, pp.557-571, August 2009.
[ Bibtex ] [ PDF ]