A wide range of properties and assumptions determine the most appropriate spatial matching model for an application, e.g. recognition, detection, registration, or large scale image retrieval. Most notably, these include discriminative power, geometric invariance, rigidity constraints, mapping constraints, assumptions made on the underlying features or descriptors and, of course, computational complexity.
We present a new approach to image indexing and retrieval, which integrates appearance with global image geometry in the indexing process, while enjoying robustness against viewpoint change, photometric variations, occlusion, and background clutter. We exploit shape parameters of local features to estimate image alignment via a single correspondence. Then, for each feature, we construct a sparse spatial map of all remaining features, encoding their normalized position and appearance, typically vector quantized to visual word. An image is represented by a collection of such feature maps and RANSAC-like matching is reduced to a number of set intersections. We use min-wise independent permutations and derive a similarity measure for feature map collections. In addition to random selection, we have further exploited multiple view matching for feature selection. This allows us to scale geometry indexing up to 1M images. We then exploit sparseness to build an inverted file whereby the retrieval process is sub-linear in the total number of images, ideally linear in the number of relevant ones.
We further present a very simple model inspired by Hough voting in the transformation space, where votes arise from single feature correspondences. A relaxed matching process allows for multiple matching surfaces or non-rigid objects under one-to-one mapping, yet is linear in the number of correspondences. We apply it to geometry re-ranking in a search engine, yielding superior performance with the same space requirements but a dramatic speed-up compared to the state of the art.
We further extend and use our relaxed spatial matching for self-matching and symmetry detection. We assume that features participating in symmetric and repeating structures have higher probability to be matched between different views of the same object. Information from geometric self-matching and matching of the image with its mirrored counterpart is used for feature selection of single images.
In contrast to the previous methods that we discussed or proposed, which all use only visual word information to perform feature matching, we further exploit the Hamming Embedding (HE) technique, which further use descriptor information. HE employs each feature with visual word and a binary signature which allows more precise feature matching. We develop a novel query expansion strategy which is aligned with the HE representation. We achieve to improve performance even without geometry matching, in contrast to previous query expansion methods, along with low query times. We finally show that combining our scheme with geometry matching can further boost performance and outperform state of the art methods.