This paper presents a region-based approach towards semantic image retrieval. Combining segmentation and the popular Bag-of-Words model, a visual vocabulary of the most common ``region types'' is first constructed using the database images. The visual words are consistent image regions, extracted through a k-means clustering process. The regions are described with color and texture features, and a "model vector" is then formed to capture the association of a given image to the visual words. Opposite to other methods, we do not form the model vector based on all region types, but rather to a smaller subset. We show that the presented approach can be efficiently applied to image retrieval when the goal is to retrieve semantically similar rather than visually similar images. We show that our method outperforms the commonly used Bag-of-Words model based on local SIFT descriptors.
5th Workshop on Semantic Media Adaptation and Personalization, Limassol, Cyprus, December 2010.
[ Bibtex ]