Keynote Talks
Research Challenges in Understanding and Indexing Media
It is easy to predict that over the next few years the internet will see an accumulation of increasingly large collections of audio (e.g., iTunes), imagery (e.g., Flickr), video (e.g., YouTube) and sensor information (weather, traffic data) together with rapid and widespread growth and innovation in new information services in the form of 'mashups' (combinations of multiple, separate data sources into one application or display) and social web activities (e.g., blogging, podcasting, media editing). All this is driving a need for improvements in semantic information extraction from structured and unstructured sources and across media, social network and contextual user modeling, multimedia retrieval, summarization of large diverse data sets, and collaborative work environments and interfaces.
This talk will motivate some of the research challenges on the horizon in the next few years (and perhaps beyond), based on my research perspectives. These challenges include: Video, audio, image and graphics understanding, and algorithms to support cross source/media mining, retrieval and fusion. The challenges are coupled with increasingly personalized user modeling and adaptive, device appropriate summarization and presentation design. I will also point out a number of problems and risks in the current research paths.
The result will be noticeable performance enhancements making everything faster, and easier. More importantly, this will result in increasing pervasiveness of new media-oriented applications and internet services.

Alex Hauptmann is Senior Systems Scientist in Computer Science at Carnegie Mellon University, and a faculty member in the Language Technologies Institute at CMU. Alex holds a BA and MA degree in Psychology from Johns Hopkins University, a 'Diplom' in Computer Science from the Technische Universitat Berlin and obtained a Ph.D. in Computer Science at Carnegie Mellon. His main interest has been multimedia analysis and retrieval. Other research interests include speech recognition and interfaces, translation and natural language in general. Most of his time is spent on the Informedia Digital Video project. This work has also spawned three spin-off companies related to digital video archiving and video question answering. Alex is also pursuing projects on video observations for patient care for the elderly and personal wearable memory devices. His current passion is the pursuit of a large-scale concept ontology for multimedia to help narrow the semantic gap.
Fusion of Information for Content Based Multimedia Database Retrieval
The retrieval of information from multimedia databases is a challenging problem because of the number of different concepts that may be of interest to the user and the multifaceted characteristics of each concept. The concept properties may span different sensing modalities and within each modality call for the use of a diverse set of features. Commonly, the retrieval problem is formulated as a detection problem (a two class pattern recognition problem), whereby the content of interest is looked for in the multimedia material and discriminated from the anti-concept class. The detectors are designed to capture the different manifestations of each concept class (colour, texture, shape, sounds). The design process is often hampered by small sample set and class imbalance problems.
The nature of the retrieval problem raises issues in information fusion. Both, feature level and decision level fusion provide useful mechanisms for tackling different aspects of the concept detector design process. At the feature level, the fusion is often accomplished with multi-kernel machine learning methods. The key question in this approach is how to weigh the contributions of the respective kernels. The weight allocation is normally controlled by regularisation. We discuss the effect of different norms on weight assignment. The findings lead to a two-stage machine learning strategy where the first stage serves simply as a means to eliminate non informative kernels. In contrast, decision level fusion is adopted for dealing with the class population imbalance problem. We show that by extreme under sampling of the negative (anti concept) class we can create a large number of weak classifiers, the fusion of which has the capacity to improve retrieval performance.
The techniques discussed are evaluated on standard benchmark databases, including PASCAL VOC 08 image data set and Mediamill Challenge video database, based on the NIST TRECVID 2005 benchmark. The performance is measured using average precision that combines precision and recall into one performance figure. The benefits of various fusion mechanisms are demonstrated.

Josef Kittler is a Distinguished Professor of Machine Intelligence and Director of the Centre for Vision, Speech and Signal Processing at the University of Surrey. He holds a BA degree in Electrical Engineering, a Ph.D. in Pattern Recognition and a ScD, all from the University of Cambridge. He has worked on various theoretical aspects of Pattern Recognition, Image Analysis and Computer Vision, and on many applications including Image Coding, Image and Video Database Retrieval and Surveillance. His major contributions to pattern recognition include the k-nearest neighbour method, feature selection, and multiple expert fusion. In computer vision, contributions include robust statistical methods for shape analysis and detection, motion estimation and segmentation, and image segmentation by thresholding and edge detection. He has served as a member of the Editorial Board of IEEE Transactions on Pattern Analysis and Machine Intelligence and currently serves on the Editorial Boards of Image and Vision Computing, Pattern Recognition Letters, Pattern Recognition and Artificial Intelligence, Pattern Analysis and Applications.




