Image and Video Analysis

Research

scm.JPG

State of the art data mining and image retrieval in community photo collections typically focus on popular subsets, e.g. images containing landmarks or associated to Wikipedia articles. We propose an image clustering scheme that, seen as vector quantization, compresses a large corpus of images by grouping visually consistent ones while providing a guaranteed distortion bound. This allows us, for instance, to represent the visual content of all thousands of images depicting the Parthenon in just a few dozens of scene maps and still be able to retrieve any single, isolated, non-landmark image like a house or a graffiti on a wall.

context-intro2.jpg

The idea behind the use of visual context information responds to the fact that not all human acts are relevant in all situations and this holds also when dealing with image analysis problems. Since visual context is a difficult notion to grasp and capture, in our research work we restrict it to the notion of ontological context. The latter is defined as part of a "fuzzified" version of traditional ontologies. Typical problems to be addressed include how to meaningfully readjust the membership degrees of image regions and how to use visual context to influence the overall results of knowledge-assisted image analysis towards higher performance.

thesaurus0.png

The motivation of this work is to tackle the problem of high-level concept detection within image and video documents using a globally annotated training set. The goal is to determine whether a concept exists within an image along with a degree of confidence and not its actual position. Since this approach begins with a coarse image segmentation, the high-level concepts that is able to tackle can be described as "materials" or "scenes". MPEG-7 color and texture features are locally extracted from coarsely segmented regions using an RSST variation. Using a significantly large set of images and after the application of a hierarchical clustering algorithm on all regions, a relatively small number of them, is selected. These regions are called "region types". This set of region types composes a visual dictionary which facilitates the mapping of low- to high-level features.