Image and Video Analysis

MPEG-7 Visual Descriptors

Visual Descriptor Applications are developed to extract and match MPEG-7 visual descriptions from associated visual content of images. An image and regions of interest can be given as input as to produce the low-level visual description characterization. Moreover those descriptions can be matched and a distance measure is returned between two images.

A simple command line user interface is provided for both tools providing an easy way to be used. Extraction and matching are both quite fast considering the fact that these can be done in less than a few milliseconds. An analytic manual is provided with information on the command line parameters.

Multiple ways of defining a region or regions from which MPEG-7 visual descriptors will be extracted are supported. The Color, Texture and Shape descriptors included are:

  • Dominant Color - the descriptor comprises of the dominant colors values, their percentage value and variance and the spatial coherency.
  • Color Structure - captures both the global color features of an images and the local spatial structure of the color.
  • Color Layout - is a compact and resolution invariant MPEG-7 visual descriptor designed to represent the spatial distribution of color in the YCbCr color space.
  • Scalable Color - is a Haar-transform based transformation applied across values of a color histogram that measures color distribution over an entire image.
  • Homogeneous Texture - provides a quintative characterization of texture and is an easy to compute and robust descriptor.
  • Edge Histogram - captures the spatial distribution of edges and represents local-edge distribution in the image.
  • Region-Based Shape - expresses the 2-d pixel distribution within an object or a region of interest based both on the contour pixel and the inner pixels.
  • Contour-Based Shape - captures the characteristic features of the contours of the objects based on an extension of the Curvature Scale-Space (CSS) representation of the contour.

Regions supported for descriptor extraction.

Regions for extraction can be:

  • Globally - descriptors extracted from the whole image.
  • Binary mask - descriptors extracted from a single region defined by a binary mask.
  • Region map - descriptors extracted from all region of a region map (segmentation mask).
  • Bounding boxes - descriptors extracted from rectangular regions defined by their coordinates ( upper left and lower right corner ).

Both applications were developed in C++ and based on the version 6.1 of the MPEG-7 eXperimentation Model (XM). OpenCV version 1.0 is used for image loading and Xerces version 2.8 for the XML parsing.

Visual Descriptor Extraction (VDE) is the application for the extraction of the MPEG-7 visual descriptors. A user has to provide an image in one of the many formats supported, in order to extract a global description. Provided as input a region mask, region map or bounding boxes description is extracted for the particular regions of interest. XML and plain text output are both an option.

Visual Descriptor Matching(VDM) is the application for the matching of the MPEG-7 visual descriptors. XML files generated by VDE after extracting globally descriptors from images are given as input and the distances for each descriptor are calculated. Distances used are the ones defined by the MPEG-7 standard.


If you use the VD applications please cite our MTAP 2009 paper shown below.




E. Spyrou, G. Tolias, P. Mylonas, Y. Avrithis. Concept detection and keyframe extraction using a visual thesaurus. In Multimedia Tools and Applications, vol 41, no. 3, pp. 337-373, February 2009.
[ Abstract ]
[ Bibtex ] [ PDF ] [ Edit ]
Giorgos Tolias