Feature map fusion in Visual Attention (VA) models is by definition an uncertain procedure. In fact, one of the major impediments in extending the static VA architecture proposed by Itti et al to account for motion and other, important to the human visual system, information is the lack of justification on how to integrate the various channels. In this paper we propose an innovative committee machine scheme which allows for dynamically changing the committee members (maps) and weighting the feature maps according to the confidence level of their estimation. Through this machine we handle the extension we make on Itti’s model; two branches were added in his architecture: A motion channel and a prior knowledge channel which accounts for the conscious search performed by humans when looking for faces in a scene. The uncertainty on which channels to use and with what weights, in feature map combination, is handled in an evidence-based manner. The evidence can be either any available prior information about the scene (used as gating in the committee machine) or the confidence levels of the estimation of the separate feature maps. Smooth changes in the map of a particular channel suggest using this channel with an increased weight in the committee machine. The experimental results, obtained when considering the face detection case study, show that the map fusion, through the proposed committee machine, leads to significantly better results, in both precision and recall, when compared with the simple skin-based face detection method.
IJISTA, special Issue on "Intelligent Image and Video Processing and Applications: The Role of Uncertainty", Volume 1, Issue 3, pp.346-358, 2006.
[ Bibtex ]