SIGMAP 2011 Abstracts


Area 1 - Multimedia and Communications

Short Papers
Paper Nr: 36
Title:

ACCURACY OF MP3 SPEECH RECOGNITION UNDER REAL-WORD CONDITIONS - Experimental Study

Authors:

Petr Pollak and Martin Behunek

Abstract: This paper presents the study of speech recognition accuracy with respect to different levels of MP3 compression. Special attention is focused on the processing of speech signals with different quality, i.e. with different level of background noise and channel distortion. The work was motivated by possible usage of ASR for off-line automatic transcription of audio recordings collected by standard wide-spread MP3 devices. The realized experiments have proved that although MP3 format is not optimal for speech compression it does not distort speech significantly especially for high or moderate bit rates and high quality of source data. The accuracy of connected digits ASR decreased consequently very slowly up to the bit rate 24 kbps. For the best case of PLP parameterization in close-talk channel just 3% decrease of recognition accuracy was observed while the size of the compressed file was approximately 10% of the original size. All results were slightly worse under presence of additive background noise and channel distortion in a signal but achieved accuracy was also acceptable in this case especially for PLP features.
Download

Paper Nr: 46
Title:

TRANSMISSION OF LOW-MOTION JPEG2000 IMAGE SEQUENCES USING CLIENT-DRIVEN CONDITIONAL REPLENISHMENT

Authors:

J. J. Sánchez-Hernández, J. P. García-Ortiz, V. González-Ruiz, I. García and D. Müller

Abstract: This work proposes a strategy for browsing interactively sequences of high resolution JPEG 2000 remote images. These sequences can be displayed in any order (forward and backward) and following any play/timing pattern. In order to increase the quality of the reconstructions where the retrieved images are only known at the moment of the visualization, this work has proposed and evaluated a novel technique based on conditional replenishment. This solution profits from the SNR/Spatial scalability of JPEG 2000 to determine which regions of the next image should be transmitted and what regions should be reused from the previously reconstructed image. Experimental results demonstrate that, even without motion compensation and with a transmission exclusively controlled by the client, the reconstructions are consistently better, both visually and from a ratedistortion point of view, than those that only remove the spatial redundancy (such as Motion JPEG 2000). Other advantages of our approach are that no data overhead is generated, the computational complexity is very small compared to similar techniques, and the fact that it can be used with any JPIP server.
Download

Paper Nr: 27
Title:

TWO – DIMENSIONAL CODES ON MOBILE DEVICES AND THE DEVELOPMENT OF THE PLATFORM

Authors:

José Manuel Fornés Rumbao and Francisco Rodríguez Rubio

Abstract: In the last times, the mobile terminals have experienced an accelerated technological development. This evolution has provided numerous advances in presentation and interactivity in general and it has given rise to the generation of numerous applications for it. Following this line; this article shows how to incorporate on mobile terminals a simple interaction with the environment across the technological successor of the bar codes: the two-dimensional codes. We will use three basic elements -camera quality, growth in data traffic and increased bandwidth in mobile phones- to create a platform that provides to the user an easy and useful way of obtaining information multimedia that improves his relation with the environment. We will look for a complete and global development of the system, that is, the generation of the two-dimensional code; his interaction with the platform and final obtaining of the information in the terminal.
Download

Area 2 - Multimedia Signal Processing

Full Papers
Paper Nr: 20
Title:

A NON-UNIFORM REAL-TIME SPEECH TIME-SCALE STRETCHING METHOD

Authors:

Adam Kupryjanow and Andrzej Czyzewski

Abstract: An algorithm for non-uniform real-time speech stretching is presented. It provides a combination of typical SOLA algorithm (Synchronous Overlap and Add ) with the vowels, consonants and silence detectors. Based on the information about the content and the estimated value of the rate of speech (ROS), the algorithm adapts the scaling factor value. The ability of real-time speech stretching and the resultant quality of voice were analysed. Subjective tests were performed in order to compare the quality of the proposed method with the output of the standard SOLA algorithm. Accuracy of the ROS estimation was assessed to prove its robustness.
Download

Paper Nr: 21
Title:

ESTIMATION-DECODING ON LDPC-BASED 2D-BARCODES

Authors:

W. Proß, M. Otesteanu and F. Quint

Abstract: In this paper we propose an extension of the Estimation-Decoding algorithm for the decoding of our Data Matrix Code (DMC), which is based on Low-Density-Parity-Check (LDPC) codes and is designed for use in industrial environment. To include possible damages in the channel-model, a Markov-modulated Gaussian channel (MMGC) was chosen to represent everything in between the embossing of a LDPC-based DMC and the camera-based acquisition. The MMGC is based on a Hidden-Markov-Model (HMM) that turns into a two-dimensional model when used in the context of DMCs. The proposed ED2D-algorithm (Estimation-Decoding in two dimensions) is implemented to operate on a 2D-LDPC-Markov factor graph that comprises of a LDPC code’s Tanner-graph and a 2D-HMM. For a subsequent comparison between different barcodes in industrial environment, a simulation of typical damages has been implemented. Tests showed a superior decoding behavior of our LDPC-based DMC decoded with the ED2D-decoder over the standard Reed-Solomon-based DMC.
Download

Paper Nr: 23
Title:

HAND IMAGE SEGMENTATION BY MEANS OF GAUSSIAN MULTISCALE AGGREGATION FOR BIOMETRIC APPLICATIONS

Authors:

Alberto de Santos Sierra, Carmen Sánchez Ávila, Javier Guerra Casanova and Gonzalo Bailador del Pozo

Abstract: Applying biometrics to daily scenarios involves demanding requirements in terms of software and hardware. On the contrary, current biometric techniques are also being adapted to present-day devices, like mobile phones, laptops and the like, which are far from meeting the previous stated requirements. In fact, achieving a combination of both necessities is one of the most difficult problems at present in biometrics. Therefore, this paper presents a segmentation algorithm able to provide suitable solutions in terms of precision for hand biometric recognition, considering a wide range of backgrounds like carpets, glass, grass, mud, pavement, plastic, tiles or wood. Results highlight that segmentation accuracy is carried out with high rates of precision (F-measure≥88%)), presenting competitive time results when compared to state-of-the-art segmentation algorithms time performance.
Download

Paper Nr: 33
Title:

SEGMENTATION OF TOUCHING LANNA CHARACTERS

Authors:

Sakkayaphop Pravesjit and Arit Thammano

Abstract: Character segmentation is an important preprocessing step for character recognition. Incorrectly segmented characters are not likely to be correctly recognized. Touching characters is one of the most difficult segmentation cases which arise when handwritten characters are being segmented. Therefore, this paper emphasizes the interest to the segmentation of touching and overlapping characters. In the proposed character segmentation process, the bounding box analysis is initially employed to segment the document image into images of isolated characters and images of touching characters. The thinning algorithm is applied to extract the skeleton of the touching characters. Next, the skeleton of the touching characters is separated into several pieces. Finally, the separated pieces of the touching characters are put back to reconstruct two isolated characters. The proposed algorithm achieves an accuracy of 75.3%.
Download

Paper Nr: 64
Title:

WIRELESS IN-VEHICLE COMPLAINT DRIVER ENVIRONMENT RECORDER

Authors:

Oscar S. Siordia, Isaac Martín de Diego, Cristina Conde and Enrique Cabello

Abstract: In this paper, an in-vehicle complaint recording device is presented. The device is divided in independent systems for image and audio data acquisition and storage. The systems, designed to work under in-vehicle complaint devices, use existent in-vehicle wireless architectures for its communication. Several tests of the recording device in a highly realistic truck simulator show the reliability of the developed system to acquire and store driver related data. The acquired data will be used for the development of a valid methodology for the reconstruction and study of traffic accidents.
Download

Short Papers
Paper Nr: 34
Title:

3D VISUALIZATION OF SINGLE IMAGES USING PATCH LEVEL DEPTH

Authors:

Shahrouz Yousefi, Farid Abedan Kondori and Haibo Li

Abstract: In this paper we consider the task of 3D photo visualization using a single monocular image. The main idea is to use single photos taken by capturing devices such as ordinary cameras, mobile phones, tablet PCs etc. and visualize them in 3D on normal displays. Supervised learning approach is hired to retrieve depth information from single images. This algorithm is based on the hierarchical multi-scale Markov Random Field (MRF) which models the depth based on the multi-scale global and local features and relation between them in a monocular image. Consequently, the estimated depth image is used to allocate the specified depth parameters for each pixel in the 3D map. Accordingly, the multi-level depth adjustments and coding for color anaglyphs is performed. Our system receives a single 2D image as input and provides a anaglyph coded 3D image in output. Depending on the coding technology the special low-cost anaglyph glasses for viewers will be used.
Download

Paper Nr: 42
Title:

IMAGE DENOISING BASED ON LAPLACE DISTRIBUTION WITH LOCAL PARAMETERS IN LAPPED TRANSFORM DOMAIN

Authors:

Vijay Kumar Nath and Anil Mahanta

Abstract: In this paper, we present a new image denoising method based on statistical modeling of Lapped Transform (LT) coefficients. The lapped transform coefficients are first rearranged into wavelet like structure, then the rearranged coefficient subband statistics are modeled in a similar way like wavelet coefficients. We propose to model the rearranged LT coefficients in a subband using Laplace probability density function (pdf) with local variance. This simple distribution is well able to model the locality and the heavy tailed property of lapped transform coefficients. A {\it maximum a posteriori} (MAP) estimator using the Laplace probability density function (pdf) with local variance is used for the estimation of noise free lapped transform coefficients. Experimental results show that the proposed low complexity image denoising method outperforms several wavelet based image denoising techniques and also outperforms two existing LT based image denoising schemes. Our main contribution in this paper is to use the local Laplace prior for statistical modeling of LT coefficients and to use MAP estimation procedure with this proposed prior to restore the noisy image LT coefficients.
Download

Paper Nr: 43
Title:

IMPROVED INTER MODE DECISION FOR H.264/AVC USING WEIGHTED PREDICTION

Authors:

Amrita Ganguly and Anil Mahanta

Abstract: H.264/AVC video coding standard outperforms former standards in terms of coding efficiency but at the expense of higher computation complexity. Of all the encoding elements in H.264, inter prediction is computationally most intensive and thus adds to the computational burden for the encoder. In this paper, we propose a fast inter prediction algorithm for JVT video coding standard H.264/AVC. Prior to performing the motion estimation for inter prediction, characteristics like stationarity and homogeneity of each macroblock is determined. The macroblocks correlation with neighboring macroblocks in respect of predicted motion vectors and encoding modes are studied. Weights are assigned for these parameters and the final mode is selected based upon these weights. The average video encoding time reduction in the proposed method is 70% compared to the JVT benchmark JM12.4 while maintaining similar PSNR and bit rate. Experimental results for various test sequences at different resolutions are presented to show the effectiveness of the proposed method.
Download

Paper Nr: 45
Title:

IMAGE MATCHING ALGORITHMS IN STEREO VISION USING ADDRESS-EVENT-REPRESENTATION - A Theoretical Study and Evaluation of the Different Algorithms

Authors:

M. Dominguez-Morales, E. Cerezuela-Escudero, A. Jimenez-Fernandez, R. Paz-Vicente, J. L. Font-Calvo, P. Iñigo-Blasco, A. Linares-Barranco and G. Jimenez-Moreno

Abstract: Image processing in digital computer systems usually considers the visual information as a sequence of frames. These frames are from cameras that capture reality for a short period of time. They are renewed and transmitted at a rate of 25-30 fps (typical real-time scenario). Digital video processing has to process each frame in order to obtain a filter result or detect a feature on the input. In stereo vision, existing algorithms use frames from two digital cameras and process them pixel by pixel until it is found a pattern match in a section of both stereo frames. Spike-based processing is a relatively new approach that implements the processing by manipulating spikes one by one at the time they are transmitted, like a human brain. The mammal nervous system is able to solve much more complex problems, such as visual recognition by manipulating neuron’s spikes. The spike-based philosophy for visual information processing based on the neuro-inspired Address-Event- Representation (AER) is achieving nowadays very high performances. In this work we study the existing digital stereo matching algorithms and how do they work. After that, we propose an AER stereo matching algorithm using some of the principles shown in digital stereo methods.
Download

Paper Nr: 48
Title:

VISUAL AER-BASED PROCESSING WITH CONVOLUTIONS FOR A PARALLEL SUPERCOMPUTER

Authors:

Rafael J. Montero-Gonzalez, Arturo Morgado-Estevez, Fernando Perez-Peña, Alejandro Linares-Barranco, Angel Jimenez-Fernandez, Bernabe Linares-Barranco and Jose Antonio Perez-Carrasco

Abstract: This paper is based on the simulation of a convolution model for multimedia applications using the neuro-inspired Address-Event-Representation (AER) philosophy. AER is a communication mechanism between chips gathering thousands of spiking neurons. These spiking neurons are able to process the visual information in a frame-free style like the human brain do. All the spiking neurons are working in parallel and each of them implement an operation when an input stimulus is received. The result of this operation could be, or not, to produce an output event. There exist AER retinas and other sensors, AER processors (convolvers, WTA filters), learning chips and robot actuators. In this paper we present the implementation of an AER convolution processor for the supercomputer CRS (cluster research support) of the University of Cadiz (UCA). This research involves a test cases design in which the optimal parameters are set to run the AER convolution in parallel processors. These cases consist on running the convolution taking an image divided in different number of parts, applying to each part a Sobel filter for edge detection, and based on the AER-TOOL simulator. Runtimes are compared for all cases and the optimal configuration of the system is discussed. In general, CRS obtain better performances when the image is subdivided than for the whole image processing.
Download

Paper Nr: 58
Title:

AER SPIKE-PROCESSING FILTER SIMULATOR - Implementation of an AER Simulator based on Cellular Automata

Authors:

Manuel Rivas-Perez, A. Linares-Barranco, A. Jimenez-Fernandez, A. Civit and G. Jimenez

Abstract: Spike-based systems are neuro-inspired circuits implementations traditionally used for sensory systems or sensor signal processing. Address-Event-Representation (AER) is a neuromorphic communication protocol for transferring asynchronous events between VLSI spike-based chips. These neuro-inspired implementations allow developing complex, multilayer, multichip neuromorphic systems and have been used to design sensor chips, such as retinas and cochlea, processing chips, e.g. filters, and learning chips. Furthermore, Cellular Automata (CA) is a bio-inspired processing model for problem solving. This approach divides the processing synchronous cells which change their states at the same time in order to get the solution. This paper presents a software simulator able to gather several spike-based elements into the same workspace in order to test a CA architecture based on AER before a hardware implementation. Furthermore this simulator produces VHDL for testing the AER-CA into the FPGA of the USB-AER AER-tool.
Download

Paper Nr: 59
Title:

A GENETIC APPROACH FOR IMPROVING THE SIDE INFORMATION IN WYNER-ZIV VIDEO CODING WITH LONG DURATION GOP

Authors:

Charles Yaacoub, Joumana Farah and Chadi Jabroun

Abstract: This work tackles the problem of side information generation for the case of large-duration GOPs in distributed video coding. Based on a previously developed technique for side-information enhancement, we develop a genetic algorithm particularly designed for large GOPs, taking into account the GOP size, the additional bitrate incurred by encoding hash information, as well as the decoding complexity. The proposed algorithm makes use of different interpolation methods available in the literature in a fusion-based approach. A significant gain in the average PSNR that can reach 2 dB is observed with respect to the best performing interpolation technique, while the algorithm is run for no more than 18% of the total number of blocks in a given video sequence. On the other hand, while the encoding complexity is a main concern in distributed video coding, the proposed solution incurs no additional complexity at the encoder side in the case of hash-based Wyner-Ziv video coding.
Download

Paper Nr: 68
Title:

WHAT ARE GOOD CGS/MGS CONFIGURATIONS FOR H.264 QUALITY SCALABLE CODING?

Authors:

Shih-Hsuan Yang and Wei-Lune Tang

Abstract: Scalable video coding (SVC) encodes image sequences into a single bit stream that can be adapted to various network and terminal capabilities. The H.264/AVC standard includes three kinds of video scalability, spatial scalability, temporal scalability, and quality scalability. Among them, quality scalability refers to image sequences of the same spatio-temporal resolution but with different fidelity levels. Two options of quality scalability are adopted in H.264/AVC, namely CGS (coarse-grain quality scalable coding) and MGS (medium-grain quality scalability), and they may be used in combinations. A refinement layer in CGS is obtained by re-quantizing the (residual) texture signal with a smaller quantization step size (QP). Using the CGS alone, however, may incur notable PSNR penalty and high encoding complexity if numerous rate points are required. MGS partitions the transform coefficients of a CGS layer into several MGS sub-layers and distributes them in different NAL units. The use of MGS may increase the adaptation flexibility, improve the coding efficiency, and reduce the coding complexity. In this paper, we investigate the CGS/MGS configurations that lead to good performance. From extensive experiments using the JSVM (Joint Scalable Video Model), however, we find that MGS should be carefully employed. Although MGS always reduces the encoding complexity as compared to using CGS alone, its rate-distortion is unstable. While MGS typically provides better or comparable rate-distortion performance for the cases with eight rate points or more, some configurations may cause an unexpected PSNR drop with an increased bit rate. This anomaly is currently under investigation.
Download

Paper Nr: 10
Title:

COLOR FACE RECOGNITION - A Multilinear-PCA Approach Combined with Hidden Markov Models

Authors:

Dimitrios S. Alexiadis and Dimitrios Glaroudis

Abstract: Hidden Markov Models (HMMs) have been successfully applied to the face recognition problem. However, existing HMM-based techniques use feature (observation) vectors that are extracted only from the images' luminance component, while it is known that color provides significant information. In contrast to the classical PCA approach, Multilinear PCA (MPCA) seems to be an appropriate scheme for dimensionality reduction and feature extraction from color images, handling the color channels in a natural, “holistic" manner. In this paper, we propose an MPCA-based approach for color face recognition, that exploits the strengths of HMMs as classifiers. The proposed methodology was tested on three publicly available color databases and produced high recognition rates, compared to existing HMM-based methodologies.

Paper Nr: 11
Title:

REAL-TIME FACE RECOGNITION WITH GPUs - A DCT-based Face Recognition System using Graphics Processing Unit

Authors:

Dimitrios S. Alexiadis, Anastasia Papastergiou and Athanasios Hatzigaidas

Abstract: In this paper, we present an implementation of a 2-D DCT-based face recognition system, which uses a high performance parallel computing architecture, based on Graphics Processing Units (GPUs). Comparisons between the GPU-based and the “gold" CPU-based implementation in terms of execution time have been made. They show that the GPU implementation (NVIDIA GeForce GTS 250) is about 50 times faster than the CPU-based one (Intel Dual Core 1.83GHz), allowing the real-time operation of the developed face recognition system. Additionally, comparisons of the DCT-based approach with the PCA-based face recognition methodology shows that the DCT-based approach can achieve comparable recognition hit rates.

Paper Nr: 25
Title:

STEREO VISION MATCHING OVER SINGLE-CHANNEL COLOR-BASED SEGMENTATION

Authors:

Pablo Revuelta Sanz, Belén Ruiz Mezcua, José M. Sánchez Pena and Jean-Phillippe Thiran

Abstract: Stereo vision is one of the most important passive methods to extract depth maps. Among them, there are several approaches with advantages and disadvantages. Computational load is especially important in both the block matching and graphical cues approaches. In a previous work, we proposed a region growing segmentation solution to the matching process. In that work, matching was carried out over statistical descriptors of the image regions, commonly referred to as characteristic vectors, whose number is, by definition, lower than the possible block matching possibilities. This first version was defined for gray scale images. Although efficient, the gray scale algorithm presented some important disadvantages, mostly related to the segmentation process. In this article, we present a pre-processing tool to compute gray scale images that maintains the relevant color information, preserving both the advantages of gray scale segmentation and those of color image processing. The results of this improved algorithm are shown and compared to those obtained by the gray scale segmentation and matching algorithm, demonstrating a significant improvement of the computed depth maps.
Download

Paper Nr: 51
Title:

VIDEO SURVEILLANCE AT AN INDUSTRIAL ENVIRONMENT USING AN ADDRESS EVENT VISION SENSOR - Comparative between Two Different Video Sensor based on a Bioinspired Retina

Authors:

Fernando Perez-Peña, Arturo Morgado-Estevez, Rafael J. Montero-Gonzalez, Alejandro Linares-Barranco and Gabriel Jimenez-Moreno

Abstract: Nowadays we live in very industrialization world that turns worried about surveillance and with lots of occupational hazards. The aim of this paper is to supply a surveillance video system to use at ultra fast industrial environments. We present an exhaustive timing analysis and comparative between two different Address Event Representation (AER) retinas, one with 64x64 pixel and the other one with 128x128 pixel in order to know the limits of them. Both are spike based image sensors that mimic the human retina and designed and manufactured by Delbruck’s lab. Two different scenarios are presented in order to achieve the maximum frequency of light changes for a pixel sensor and the maximum frequency of requested pixel addresses on the AER output. Results obtained are 100 Hz and 1.88 MHz at each case for the 64x64 retina and peaks of 1.3 KHz and 8.33 MHz for the 128x128 retina. We have tested the upper spin limit of an ultra fast industrial machine and found it to be approximately 6000 rpm for the first retina and no limit achieve at top rpm for the second retina. It has been tested that in cases with high light contrast no AER data is lost.
Download

Area 3 - Multimedia Systems and Applications

Full Papers
Paper Nr: 16
Title:

THE WINDSURF LIBRARY FOR THE EFFICIENT RETRIEVAL OF MULTIMEDIA HIERARCHICAL DATA

Authors:

Ilaria Bartolini, Marco Patella and Guido Stromei

Abstract: Several modern multimedia applications require the management of complex data, that can be defined as hierarchical objects consisting of several component elements. In such scenarios, the concept of similarity between complex objects clearly recursively depends on the similarity between component data, making difficult the resolution of several common tasks, like processing of queries and understanding the impact of different alternatives available for the definition of similarity between objects. To overcome such limitations, in this paper we present the WINDSURF library for management of multimedia hierarchical data. The goal of the library is to provide a general framework for assessing the performance of alternative query processing techniques for efficient retrieval of complex data that arise in several multimedia applications, such as image/video retrieval and the comparison of collection of documents. We designed the library so as to include characteristics of generality, flexibility, and extensibility: these are provided by way of a number of different templates that can be appropriately instantiated in order to realize the particular retrieval model needed by the user.
Download

Paper Nr: 29
Title:

LATENT TOPIC VISUAL LANGUAGE MODEL FOR OBJECT CATEGORIZATION

Authors:

Lei Wu, Nenghai Yu, Jing Liu and Mingjing Li

Abstract: This paper presents a latent topic visual language model to handle variation problem in object categorization. Variations including different views, styles, poses, etc., have greatly affected the spatial arrangement and distribution of visual features, on which previous categorization models largely depend. Taking the object variations as hidden topics within each category, the proposed model explores the relationship between object variations and visual feature arrangement in the traditional visual language modeling process. With this improvement, the accuracy of object categorization is further boosted. Experiments on Caltech101 dataset have shown that this model makes sense and is effective.

Short Papers
Paper Nr: 12
Title:

CONTEXT BASED WATERMARKING OF SECURE JPEG-LS IMAGES

Authors:

A. V. Subramanyam and Sabu Emmanuel

Abstract: JPEG-LS is generally used to compress bio-medical or high dynamic range images. These compressed images sometime needs to be encrypted for confidentiality. In addition, the secured JPEG-LS compressed images may need to be watermarked to detect copyright violation, track different users handling the image, prove ownership or for authentication purpose. In the proposed technique, watermark is embedded in the context of the compressed image while the Golomb coded bit stream is encrypted. The extraction of watermark can be done during JPEG-LS decoding. The advantage of this watermarking scheme is that the media need not be decompressed or decrypted for embedding watermark thus saving computational complexity while preserving the confidentiality of the media.
Download

Paper Nr: 30
Title:

EFFECTIVE INTERFERENCE REDUCTION METHOD FOR SPREAD SPECTRUM FINGERPRINTING

Authors:

Minoru Kuribayashi

Abstract: The iterative detection method was proposed in IH2008 specified for the CDMA-based fingerprinting scheme which embedding procedure was additive watermarking method. Such a detection method is applicable for the multiplicative watermarking method that modulates a fingerprint using the characteristic of a content. In this study, we study the interference among fingerprints embedded in a content in the hierarchical version of Cox's scheme, and propose the effective detection method that iteratively detects colluders combined with a removal operation. By introducing two kinds of thresholds, the removal operation is adaptively performed to reduce the interference without causing serious false detection.
Download

Paper Nr: 37
Title:

SENSES - WHAT U SEE? - Vision Screening System Dedicated for iOS Based Devices Development and Screening Results

Authors:

Robert Kosikowski, Lukasz Kosikowski, Piotr Odya and Andrzej Czyzewski

Abstract: This paper describes a design and implementation of the vision screening system dedicated for iOS (iPhone/iPad/iPod Operating System) based devices. The aim of the system is to promote and popularize the vision tests, especially among children and youth. The examination consists of color vision and contrast differentiation tests. After the examination the system automatically evaluates users’ answers and generates the results. Test data are anonymously sent to the server allowing for a detailed analysis. The paper contains analysis of the results on the population of about 3800 people. Presented data show that vision problems concern about half of users. The analysis was divided into two age groups (pre-school children and older) and two types of eye disorders - vision acuity and perceptions of colors including Dalton testing. Test for the first age group has been adapted to examine people with special educational needs.
Download

Paper Nr: 56
Title:

QUALITY EVALUATION OF NOVEL DTD ALGORITHM BASED ON AUDIO WATERMARKING

Authors:

Andrzej Ciarkowski and Andrzej Czyżewski

Abstract: Echo cancellers typically employ a doubletalk detection (DTD) algorithm in order to keep the adaptive filter from diverging in the presence of near-end speech signal or other disruptive sounds in the microphone signal. A novel doubletalk detection algorithm based on techniques similar to those used for audio signal watermarking was introduced by the authors. The application of the described DTD algorithm within acoustic echo cancellation system is presented. The comparison of the proposed algorithm with very common, but simple Geigel algorithm and representing current state-of-the-art Normalized Cross-Correlation algorithms is performed. Both objective (ROC) and subjective (listening tests) performance evaluation methods are employed to obtain exhaustive evaluation results in simulated real-world conditions. The evaluation results are presented and their relevance is discussed. An issue of algorithms’ computational complexity is emphasized and conclusions are drawn.
Download

Paper Nr: 60
Title:

OPTIMAL COMBINATION OF LOW-LEVEL FEATURES FOR SURVEILLANCE OBJECT RETRIEVAL

Authors:

Virginia Fernandez Arguedas, Krishna Chandramouli, Qianni Zhang and Ebroul Izquierdo

Abstract: In this paper, a low-level multi-feature fusion based classifier is presented for studying the performance of an object retrieval method from surveillance videos. The proposed retrieval framework exploits the recent developments in evolutionary computation algorithm based on biologically inspired optimisation techniques. The multi-descriptor space is formed with a combination of four MPEG-7 visual features. The proposed approach has been evaluated against kernel machines for objects extracted from AVSS 2007 dataset.
Download

Paper Nr: 73
Title:

MANAGING MULTIPLE MEDIA STREAMS IN HTML5 - The IEEE 1599-2008 Case Study

Authors:

S. Baldan, L. A. Ludovico and D. A. Mauro

Abstract: This paper deals with the problem of managing multiple multimedia streams in aWeb environment. Multimedia types to support are pure audio, video with no sound, and audio/video. Data streams refer to the same event or performance, consequently they both have and should maintain mutual synchronization. Besides, a Web player should be able to play different multimedia streams simultaneously, as well as to switch from one to another in real time. The clarifying example of a music piece encoded in IEEE 1599 format will be presented as a case study.
Download

Paper Nr: 41
Title:

EXPLORING THE DIFFERENCES IN SURFACE ELECTROMYOGRAPHIC SIGNAL BETWEEN MYOFASCIAL-PAIN AND NORMAL GROUPS - Feature Extraction through Wavelet Denoising and Decomposition

Authors:

Ching-Fen Jiang, Nan-Ying Yu and Yu Ching Lin

Abstract: Upper-back myofascial pain is an increasingly significant syndrome associated with frequent computer using. However, the changes in neuromuscular functions incurred by myofascial pain are still under-discovered. This study aims to discover the changes in neuromuscular function on the taut band through signal analysis of surface electromyography. We first developed a fully automatic algorithm to detect the duration of an epoch of muscle contraction. Following that, the features of epochs in both time-domain and frequency-domain were extracted from the 13 patients to compare with the measurement from 13 normal subjects. The higher contraction strength with lower median frequency found in the patient group is similar to the reported changes with muscle fatigue. The signal was further analyzed by wavelet energy of 17 levels. The result shows that the energy measured from the patients exceeds that from the normal group at the low frequency band, suggesting that an increasing synchronization level of motor unit recruitment may cause the drop in the median frequency and the increase in contraction strength.
Download

Paper Nr: 61
Title:

AUTOMATIC SOUND RESTORATION SYSTEM - Concepts and Design

Authors:

Andrzej Czyzewski, Bozena Kostek and Adam Kupryjanow

Abstract: A concept of a system for automatic audio recording reconstruction is described. It is supported by the video image reconstruction algorithm, focused on the video instability analysis. Sound restoration is performed focusing on noise and wow and flutter analysis. Presented algorithms are designed to be automatic and to reduce the human effort during the restoration process. A web service designed especially for automatic restoration process is envisioned as an integration platform for these algorithms and for repository of recordings.
Download

Paper Nr: 65
Title:

A SPATIAL IMMERSIVE OFFICE ENVIRONMENT FOR COMPUTER-SUPPORTED COLLABORATIVE WORK - Moving Towards the Office of the Future

Authors:

Maarten Dumont, Sammy Rogmans, Steven Maesen, Karel Frederix, Johannes Taelman and Philippe Bekaert

Abstract: In this paper, we present our work in building a prototype office environment for computer-supported collaborative work, that spatially – and auditorially – immerses the participants, as if the augmented and virtual generated environment was a true extension of the physical office. To realize this, we have integrated various hardware, computer vision and graphics technologies from either existing state-of-the-art, but mostly from knowledge and expertise in our research center. The fundamental components of such an office of the future, i.e. image-based modeling, rendering and spatial immersiveness, are illustrated together with surface computing and advanced audio processing, to go even beyond the original concept.
Download

Paper Nr: 66
Title:

FOUR-PHASE RE-SPEAKER TRAINING SYSTEM

Authors:

Aleš Pražák, Zdeněk Loose, Josef Psutka, Vlasta Radová and Luděk Müller

Abstract: Since the re-speaker approach to the automatic captioning of TV broadcastings using large vocabulary continuous speech recognition (LVCSR) is on the increase, there is also a growing demand for training systems that would allow new speakers to learn the procedure. This paper describes a specially designed re-speaker training system that provides gradual four-phase tutoring process with quantitative indicators of a trainee progress to enable faster (and thus cheaper) training of the re-speakers. The performance evaluation of three re-speakers who were trained on the proposed system is also reported.
Download