## Semantic Image and Video Segmentation

Using semantics in image and video segmentation

Automatic segmentation of images and videos is a very challenging task in computer vision and one of the most crucial steps toward image and video understanding. In this research work we propose to include semantic criteria in the segmentation process to capture the semantic properties of objects that visual features such as color or texture are not able to describe. Traditionally, the tasks of segmentation and recognition have been treated sequentially and inevitably erroneous segmentation leads to poor results in recognition while imperfections of global image classification are responsible for deficient segmentation. It is rather obvious that limitations of one prohibit the efficient operation of the other. We propose an algorithm that involves simultaneous segmentation and recognition of objects, based on region growing techniques, where region merging is based on semantic similarity criteria.

• An image or a video sequence is decomposed into a set of non-overlapping regions or volumes, respectively. Each region/volume is assigned a fuzzy set of semantic labels such as: $\mathcal{L}_{a}=\sum_{i=1}^{|C|}{c_i/\mu_{a}(c_i)}$ where $\mathcal{L}_{a}$ is the fuzzy set of labels for region/volume $a$. $C$ is the set of all possible concepts, $\mu_{a}(c_{i})$ is the degree of membership of the concept $c_{i}$ in the fuzzy set $\mathcal{L}_{a}$.
• A semantic similarity measure between two neighbour regions/volumes $a$ and $b$ is defined, based on their fuzzy sets of semantic labels $\mathcal{L}_{a}$ and $\mathcal{L}_{b}$: $s_{ab}=\max_{c_{i}\in{C}}\langle T_{norm}(\mathcal{L}_{a},\mathcal{L}_{b})\rangle$ where $T_{norm}$ can be any fuzzy t-norm. Intuitively, semantic similarity $s_{ab}$ is the highest degree, implied by our knowledge, that regions/volumes $a$ and $b$ share the same concept.
• An Attributed Relational Graph (ARG) is used as representation of an image or video sequence. Graph's vertices represent regions/volumes with their assigned fuzzy sets as attributes, while graph's edges correspond to semantic similarity between the connected vertices (regions/volumes).

Indicative results
Results of the application of semantic segmentation in images

The figure below illustrates the recognition results using as input a standard hierarchical segmentation algorithm, such as RSST (second column), while the third column displays the results of the proposed semantic segmentation algorithm. One can notice the improvement in oversegmented areas like the wavy sea or shadowed parts of the sand, while at the same time the confidence of recognition also increased.

Results of the application of semantic segmentation in video sequences

In the case of video sequences, our goal is to successfully match spatiotemporally segmented volumes within and between blocks of frames (BoFs), in order to achieve a consistent and semantically meaningfull video segmentation. One can see the difference between consecutive BoFs of the second and the third collumn, where in the latter the consistent color index indicates correct tracking of objects. Segmentation of these objects (person, sea and sky) is illustrated in the last three collumns respectively.

Precision-recall figures of recognition per concept

#### Journals

. In IEEE Transactions on Circuits and Systems for Video Technology, vol 17, no. 3, pp. 298 - 312, March 2007.
[ Abstract ]
[ Bibtex ] [ PDF ]

#### Conferences

. In Proceedings of 10th International Workshop on Multimedia Signal Processing (MMSP 2008), Cairns, Australia, October 2008.
[ Abstract ]
[ Bibtex ] [ PDF ]

#### Journals

. In Neural Network World, vol 19, no. 5, pp. 561-579, October 2009.
[ Abstract ]
[ Bibtex ] [ PDF ]