SIGMAP 2008 Abstracts

Area 1 - Multimedia Communications

Full Papers

Paper Nr:	102
Title:	SUFFIX ARRAYS - A Competitive Choice for Fast Lempel-Ziv Compressions
Authors:	Artur Ferreira, Arlindo Oliveira and Mário Figueiredo
Abstract:	Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple data structure, named suffix array, to hold the dictionary of the LZ encoder, and propose an algorithm to search the dictionary. A comparison with the suffix tree based LZ encoder is carried out, showing that the compression ratios are roughly the same. The ammount of memory required by the suffix array is fixed, being much lower than the variable memory requirements of the suffix tree encoder, which depends on the text to encode. We conclude that suffix arrays are a very interesting option regarding the tradeoff between time, memory, and compression ratio, when compared with suffix trees, that make them preferable in some compression scenarios.
Download

Short Papers

Paper Nr:	43
Title:	CONTEXT-AWARE HOARDING OF MULTIMEDIA CONTENT IN A LARGE-SCALE TOUR GUIDE SCENARIO - A Case Study on Scaling Issues of a Multimedia Tour Guide
Authors:	Julius Köpke, R. Tusch, Hermann Hellwagner and L. Böszörmenyi
Abstract:	This paper discusses scaling issues of a mobile multimedia tour guide. Making tourist-information available in a substantially large geographical area (e.g. a federal state in Austria) raises new questions, compared to providing similar information in a limited area (such as a museum). First, we have to assume a heterogeneous network infrastructure containing high and low bandwidth links and even total network loss. Video streaming is therefore not possible at any place. Secondly, the total amount of data grows linearly to the number of Points of Interest (POIs) which are augmented by the tour guide. Therefore, a preloading of all data onto a device with limited storage is not possible. A possible solution to these problems is hoarding, i.e. preloading an ”appropriate” subset of data. The crucial question is to find the proper subset in dependence of the actual context. The paper discusses the questions of (1) what kind of context information should be considered and (2) what kind of usage patterns can be assumed. Based on these considerations hoarding strategies are developed for the tour guide. The strategies are finally evaluated with real-world data from a federal state wide tourist-card system.
Download

Paper Nr:	136
Title:	AN ADAPTIVE SPATIAL ERROR CONCEALMENT FOR H.264/AVC VIDEO STREAM
Authors:	Jun Wang, Lei Wang, Takeshi Ikenaga and Satoshi Goto
Abstract:	Transmission of compressed video over error prone channels may result in packet losses or errors, which can significantly degrade the image quality. Therefore an error concealment scheme is applied at the video receiver side to mask the damaged video. Considering there are 3 types of MBs (Macro Blocks) in natural video frame, i.e., Textural MB, Edged MB, and Smooth MB, this paper proposes an adaptive spatial error concealment which can choose 3 different methods for these 3 different MBs. For criteria of choosing appropriate method, 2 factors are taken into consideration. Firstly, standard deviation of our proposed edge statistical model is exploited. Secondly, some new features of latest video compression standard H.264/AVC, i.e., intra prediction mode is also considered for criterion formulation. Compared with previous works, which are only based on deterministic measurement, proposed method achieves the best image recovery. Subjective and objective image quality evaluations in experiments confirmed this.
Download

Paper Nr:	17
Title:	QoS IMPROVEMENTS RESULT FROM TCP/RLC AND MAC IN A MOBILE CHANNEL
Authors:	Jahangir Dadkhah Chimeh, Mohammad Hakkak, Hamidreza Bakhshi and Paeiz Azmi
Abstract:	Mobile telecommunication new services are based on data networks specially Internet. These services include http, telnet, ftp, Simple Mail Transfer Protocol (SMTP), etc. Besides we recognize a mobile network as a multi-user network. Transmission Control Protocol/Internet Protocol (TCP/IP) which is sensitive to link congestion in wireline data links is also used in wireless networks. In order to improve the system performance, the TCP layer uses flow control and congestion control. Besides, Radio Link Control (RLC) has been introduced to compensate the deficiency of TCP layer in wireless environment. MAC and RLC have important roles in quality of service improvement of UMTS. In this paper we verify TCP over Automatic Repeat reQuest (ARQ) error control mechanism and finally quality of service improvement results from it in the fading channels.
Download

Area 2 - Multimedia Signal Processing

Full Papers

Paper Nr:	16
Title:	FAR-END CROSSTALK IN ITERATIVELY DETECTED MIMO-OFDM TWISTED PAIR TRANSMISSION SYSTEMS
Authors:	Andreas Ahrens and Christoph Lange
Abstract:	Crosstalk between neighbouring wire pairs in multi-pair copper cables is an important disturbance, which essentially limits the transmission quality and the throughput of such cables. For high-rate transmission, often the strong near-end crosstalk (NEXT) disturbance is avoided or suppressed and only the far-end crosstalk (FEXT) remains as crosstalk influence. In this contribution the effect of far-end crosstalk (FEXT) in iteratively detected MIMO-OFDM transmission schemes is studied. EXIT (extrinsic information transfer) charts are used for analyzing and optimizing the convergence behaviour of the iterative demapping and decoding.
Download

Paper Nr:	69
Title:	NOVEL DIGITAL DIFFERENTIATOR AND CORRESPONDING FRACTIONAL ORDER DIFFERENTIATOR MODELS
Authors:	Maneesha Gupta, Pragya Varshney, Gangaikondan Visweswaran and Balbir Kumar
Abstract:	This paper proposes a novel first order digital differentiator. The differentiator is obtained by linear mixing of Al-Alaoui operator (Al-Alaoui, 1993) and wide band differentiator (Hsue, 2006). MATLAB simulation results of the proposed differentiator for various sampling frequencies have been presented. The magnitude results are in close conformity to the theoretical results for approximately 78% of the full range. The phase of the new differentiator is almost linear, with a maximum phase error of 8.24º. We have also proposed new operator based fractional order differentiator models. These models are obtained by performing the Taylor series expansion and continued fraction expansion of the proposed operator. Comparisons of the suggested models with the existing models of half differentiators show perceptible improvement in performance of the fractional order circuit. MATLAB simulation results show that the magnitude response of the proposed half differentiator matches with the theoretical results of continuous-time domain half differentiator for almost the whole frequency range and the phase approximates a constant group delay which is desirable for many applications. The major purpose of this paper is to emphasize that fractional order control systems are better than the conventional order systems as the system control performance is enhanced.
Download

Paper Nr:	73
Title:	ACCURACY ANALYSES OF PASSIVE TRACKING OF SEVERAL CLICKING SPERM WHALES - A Case of Complex Sources Binding
Authors:	Frédéric CAUDAL and Hervé Glotin
Abstract:	This paper provides a real-time passive underwater acoustic method to track multiple emitting whales using four or more omni-directional widely-spaced bottom-mounted hydrophones and to evaluate the performance of the system via the Crame´r-Rao Lower Bound (CRLB) and Monte Carlo simulations. After a non-parametric Teager-Kaiser-Mallat signal filtering, rough Time Delays Of Arrival are calculated, selected and filtered, and used to estimate the positions of whales for a constant or linear sound speed profile. The complete algorithm is tested on real data from the NUWC and the AUTEC. The CRLB and Monte Carlo simulations are computed and compared with the tracking results. Our model is validated by similar results from the US Navy and Hawaii univ labs in the case of one whale, and by similar whales counting from the Columbia univ. ROSA lab in the case of multiple whales. At this time, our tracking method is the only one giving typical speed and depth estimations for multiple (5) emitting whales located at 1 to 5 km from the hydrophones.
Download

Paper Nr:	85
Title:	STATIC FEATURES IN ISOLATED VOWEL RECOGNITION AT HIGH PITCH
Authors:	Anibal Ferreira
Abstract:	Vowel recognition is frequently based on Linear Prediction (LP) analysis and formant estimation techniques. However, the performance of these techniques decreases in the case of female or child speech because at high pitch frequencies (F0) the magnitude spectrum is scarcely sampled making formant estimation unreliable. In this paper we describe the implementation of a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. PSC based features were evaluated in automatic recognition tests using the Mahalanobis distance and using a data base of five natural Portuguese vowel sounds uttered by 44 speakers, 27 of whom are child speakers. LP based features and Mel-Frequency Cepstral Coefficients (MFCC) were also included in the tests as a reference. Results show that while the recognition performance of PSC features falls between that of LP based features and that of MFCC coefficients, the normalization of PSC features by F0 increases the performance and approaches that of MFCC coefficients. PSC features are not only amenable to a psychophysical interpretation (as LP based features are) but have also the potential to compete with global shape features such as MFCCs.
Download

Paper Nr:	114
Title:	CONFIGURABLE VLSI ARCHITECTURE OF A GENERAL PURPOSE LIFTING-BASED WAVELET PROCESSOR
Authors:	Andre Guntoro, Hans-Peter Keil and Manfred Glesner
Abstract:	The richness of wavelet transformation has been known in many fields. There exist different classes of wavelet filters that can be used depending on the application. In this paper, we propose a general purpose lifting-based wavelet processor that can perform various forward and inverse DWTs. Our architecture is based on NxM PEs which can perform either prediction or update on a continuous data stream in every clock cycle. We also consider the normalization step which takes place at the end of the forward DWT or at the beginning of the inverse DWT. To cope with different wavelet filters, we feature a multi-context configuration to select among various DWTs. For the 16-bit implementation, the estimated area of the proposed wavelet processor with 2x8 PEs configuration in a 0.18-µm technology is 1.8 mm square and the estimated frequency is 355 MHz.
Download

Paper Nr:	120
Title:	FACE DETECTION USING DISCRETE GABOR JETS AND COLOR INFORMATION
Authors:	Ulrich Hoffmann, Jacek Naruniec, Ashkan Yazdani and Touradj Ebrahimi
Abstract:	Face detection allows to recognize and detect human faces and provides information about their location in a given image. Many applications such as biometrics, face recognition, and video surveillance employ face detection as one of their main modules. Therefore, improvement in the performance of existing face detection systems and new achievements in this field of research are of significant importance. In this paper a hierarchical classification approach for face detection is presented. In the first step, discrete Gabor jets (DGJ) are used for extracting features related to the brightness information of images and a preliminary classification is made. Afterwards, a skin detection algorithm, based on modeling of colored image patches, is employed as a post-processing of the results of DGJ-based classification. It is shown that the use of color efficiently reduces the number of false positives while maintaining a high true positive rate. Finally, a comparison is made with the OpenCV implementation of the Viola and Jones face detector and it is concluded that higher correct classification rates can be attained using the proposed face detector.
Download

Paper Nr:	130
Title:	PARTIAL TRACKING IN SINUSOIDAL MODELING - An Adaptive Prediction-based RLS Lattice Solution
Authors:	Leonardo Nunes, Paulo Esquef, Luiz Biscainho and Ricardo Merched
Abstract:	Partial tracking plays an important role in sinusoidal modeling analysis, being the stage in which the model parameters are obtained. This is accomplished by coherently grouping the spectral peaks found in each frame into time-evolving tracks of varying frequency and amplitude. The main difficulties faced by partial tracking algorithms are the analysis of polyphonic signals and the pursuit of tracks exhibiting strong modulations in frequency and amplitude. In these circumstances, linear prediction over the trajectory of a given track has been shown to improve partial tracking performance. This paper proposes an adaptive RLS lattice filter for the purpose of prediction in partial tracking. A new heuristic which certifies the filter convergence is also presented. Computer simulation results are shown to compare the proposed implementation with that of other predictors. The performance of the proposed solution is similar to that of competing methods, albeit with reduced computational complexity as well as improved numerical stability.
Download

Paper Nr:	156
Title:	A ROBUST SPEECH COMMAND RECOGNIZER FOR EMBEDDED APPLICATIONS
Authors:	Alexandre Maciel, Arlindo Veiga, Cláudio Neves, José David Águas Lopes, Carla Lopes, Fernando Perdigão and Luís Sá
Abstract:	This paper describes a command-based robust speech recognition system for the Portuguese language. Due to an efficient noise reduction algorithm the system can be operated in adverse noise environments such as in cars or factories. The recognizer was trained and tested with a speech database with 250 commands spoken by 345 speakers in clean and noisy conditions. The system incorporates a user friendly application programming interface and was optimized for embedded platforms with limited computational resources. Performance tests for the recognizer are presented.
Download

Short Papers

Paper Nr:	25
Title:	MOTION CAPTURE FOR 3D DATABASES - Overview of Methods for Motion Capture in 3D Databases
Authors:	Dalibor Lupinek and Martin Drahansky
Abstract:	Motion capture is a modern method which is commonly used in animation and augmented reality. There exists a large variety of functional systems that are based on different principles. The main concept of this paper is to provide a preview for basic description of potential motion capture systems that are widely used or represent a promising future. In addition, this paper presents an overview of a new system, which is now in development.
Download

Paper Nr:	100
Title:	NOISE REDUCTION BASED ON CROSS TF ε-FILTER
Authors:	Tomomi Matsumoto, Mitsuharu Matsumoto and Shuji Hashimoto
Abstract:	A time-frequency ε-filter (TF ε-filter) is an advanced ε-filter applied to complex spectra along the time axis. It can reduce most kinds of noise while preserving a signal that varies frequently such as a speech signal. The filter design is simple and it can effectively reduce noise. It is applicable not only to small amplitude stationary noise but also to large amplitude nonstationary noise. However when we consider the noise that varies much frequently along the time axis, TF ε-filter cannot reduce noise without the signal distortion. When we consider the noise where the neighboring frequency bins have similar powers such as impulse noise, we can reduce the noise by using ε-filter applied to the complex spectra not along the time axis, but along the frequency axis. This paper introduces an advanced method for noise reduction that applies ε-filter to complex spectra not only along the time axis but also along the frequency axis labeled cross TF ε-filter. We conducted the experiments utilizing the sounds with stationary, nonstationary and natural noise.
Download

Paper Nr:	107
Title:	CLASSIFICATION OF MOTOR IMAGINARY TASKS USING ADAPTIVE RECURSIVE BANDPASS FILTER - Effective Classification for Motor Imaginary BCI
Authors:	Vickneswaran Jeyabalan, Andrews Samraj and Loo Chu Kiong
Abstract:	The noteworthy point in the advancement of Brain Computer Interface (BCI) research is not only to develop a new technology but also to adopt the easiest procedures since the expected beneficiaries are of disabled. The nature of the locked-in patients is that, they possess strong mental ability in thinking and understanding but they are extremely unable to express their views. Imagination is possible for almost all of the locked-in patients; hence a BCI which does not rely on finger movements or other muscle activity is definitely an added advantage in this arena. The objective of this paper is to identify and classify motor imaginary signals extracted from the left and right cortex of the human brain. This is realised by implementing an adaptive bandpass filter with the combination of frequency shifting and segmentation techniques. The signals are captured using Electro-Encephalogram (EEG) from the C3, C4, and Cz channels of the scalp electrodes and is pre-processed to expose the motor imaginary signals. The result of classification using a simple threshold articulates the effectiveness of our proposed technique. The best results were found in the latency range of 3 to 9 seconds of the imagination and this proves the existing neuro-science knowledge.
Download

Paper Nr:	119
Title:	TIME DOMAIN ATTACK AND RELEASE MODELING - Applied to Spectral Domain Sound Synthesis
Authors:	Cornelia Kreutzer, Jacqueline Walker and Michael O'Neill
Abstract:	We introduce a time-domain model for the synthesis of attack and release parts of musical sounds. This approach is an extension of a spectral synthesis model we developed: the Reduced Parameter Synthesis Model (RPSM). The attack and release model is independent from a preceding spectral analysis as it is based on the time domain sustain part of the sound. The model has been tested with linear and polynomial shaping functions and produces good results for three different instruments. The time-domain approach overcomes the problem of synthesis artifacts that often occur when using spectral analysis/synthesis methods for sounds with transient events. Moreover, the model can be combined with any synthesis model of the sustain part and offers the possibility to determine the duration of the attack and release parts of the sound.
Download

Paper Nr:	134
Title:	SIGNAL-DEPENDENT ANALYSIS OF SIGNALS SAMPLED BY SEND ON DELTA SAMPLING SCHEME
Authors:	Modris Greitans and Rolands Shavelis
Abstract:	Interest in the application of signal driven sampling schemes is increasing as they offer various advantages over traditional sampling. The paper describes the principles and discusses the properties of sampling, which is based on the send-on-delta concept. In such a way, it is possible to decrease the sampling density, and since the samples are placed non-equidistantly it is possible to suppress the distortion due to frequency aliasing. The non-uniform location of samples requires an advanced processing method. The paper discusses the spectral estimation, which is based on the use of a bank of minimum variance filters. To improve the resolution and accuracy, iterative updating of autocorrelation matrix is used. The results of simulations are presented. The use of an iterative algorithm allows correcting spectral estimation even if the mean sampling density is several times less than the Nyquist criterion. The proposed approach can be of interest for distributed wireless data acquisition in remote sensing systems, because it allows the amount of transmitted data to be decreased considerably.
Download

Paper Nr:	137
Title:	CONSTANT BITRATE CONTROL FOR A DISTRIBUTED VIDEO CODING SYSTEM
Authors:	Mariusz Jakubowski, João Ascenso and Grzegorz Pastuszak
Abstract:	In some distributed video coding (DVC) systems, the total bitrate depends mainly on the key frames (Intra coded) quality and on the side information accuracy. In this paper, a rate control (RC) mechanism is proposed to achieve and maintain a certain target bitrate for the overall Intra and WZ bitstream, mainly by adjusting online the Intra frames quality through the quantization parameter (QP). In order to obtain a similar decoded quality of Intra and WZ frames, the relevant parameters: QP for the key frames and the quantization index (QIndex) for WZ frames are controlled jointly. The major novelty of this work is a statistical model that expresses the relationship between QIndex and WZ frames bitrate. The proposed rate control solution is integrated into the VISNET2 WZ codec and the experimental results demonstrate the efficiency of the proposed algorithm to reach and maintain the target bitrate.
Download

Paper Nr:	138
Title:	BIRTH-DEATH FREQUENCIES VARIANCE OF SINUSOIDAL MODEL - A New Feature for Audio Classification
Authors:	Shahrokh Ghaemmaghami and Jalil Shirazi
Abstract:	In this paper, a new feature set for audio classification is presented and evaluated based on sinusoidal modeling of audio signals. Variance of the birth-death frequencies in sinusoidal model of signal, as a measure of harmony, is used and compared to typical features as the input into an audio classifier. The performance of this sinusoidal model feature is evaluated through classification of audio to speech and music using both the GMM and the SVM classifiers. Classification results show that the proposed feature is quite successful in speech/music classification. Experimental comparisons with popular features for audio classification, such as HZCRR and LSTER, are presented and discussed. By using a set of three features, we achieved 96.83% accuracy, in one-sec segment based audio classification.
Download

Paper Nr:	159
Title:	AN IMPROVED STEGANOGRAPHIC METHOD
Authors:	Hyoung-Joong Kim and Md. Amiruzzaman
Abstract:	An improved steganographic method is proposed in this paper. Two distinct methods are combined here with optimized way with possibly high data hiding capability. The proposed method shifts the last nonzero AC coefficients in each JPEG block, and, changes the magnitude value of the first nonzero AC coefficients.
Download

Paper Nr:	162
Title:	BIMODAL QUANTIZATION OF WIDEBAND SPEECH SPECTRAL INFORMATION
Authors:	Driss Guerchi
Abstract:	In this work we introduce an efficient method to reduce the coding rate of the spectral information in an algebraic code-excited linear prediction (ACELP) wideband codec. The Bimodal Vector Quantization (BMVQ) exploits the interframe correlation in spectral information to reduce the coding rate while maintaining high coded speech quality. In the BMVQ training phase, two codebooks are separately designed for voiced and unvoiced speech. For each speech frame, the optimal codebook for the search procedure is selected according to the interframe correlation of the spectral information. The BMVQ was successfully implemented in an ACELP wideband coder. The objective and subjective performance were found to be comparable to that of the combination of the split vector quantization and multistage vector quantization at 2.3 kbit/s.
Download

Paper Nr:	178
Title:	IMPROVEMENT OF THE SIMPLIFIED FTF-TYPE ALGORITHM
Authors:	Madjid Arezki, ahmed benallal, Abderezak Guessoum and Daoued Berkani
Abstract:	In this paper, we propose a new algorithm M-SMFTF which reduces the complexity of the simplified FTF-type (SMFTF) algorithm by using a new recursive method to compute the likelihood variable. The computational complexity was reduced from 7L to 6L, where L is the finite impulse response filter length. Furthermore, this computational complexity can be significantly reduced to (2L+4P) when used with a reduced P-size forward predictor. Finally, some simulation results are presented and our algorithm shows an improvement in convergence over the normalized least mean square (NLMS).
Download

Paper Nr:	192
Title:	HMM INVERSION WITH FULL AND DIAGONAL COVARIANCE MATRICES FOR AUDIO-TO-VISUAL CONVERSION
Authors:	Lucas Daniel Terissi and Juan Carlos Gómez
Abstract:	A speech driven MPEG-4 compliant facial animation system is proposed in this paper. The main feature of the system is the audio-to-visual conversion based on the inversion of an Audio-Visual Hidden Markov Model. The Hidden Markov Model Inversion algorithm is derived for the general case of considering full covariance matrices for the audio-visual observations. A performance comparison with the more common case of considering diagonal covariance matrices is carried out. Experimental results show that the use of full covariance matrices is preferable since it leads to an accurate estimation of the visual parameters, yielding the same performance as in the case of using diagonal covariance matrices, but with a less complex model.
Download

Paper Nr:	23
Title:	LANGUAGE MODEL BASED ON POS TAGGER
Authors:	Bartosz Ziolko, Suresh Manandhar, Richard C. Wilson and Mariusz Ziolko
Abstract:	Language models are necessary for any large vocabulary speech recogniser. There are two main types of information which can be used to support modelling a language: syntactic and semantic. One of the ways to apply syntactic modelling is to use POS taggers. Morphological information can be statistically analysed to provide probability of a sequence of words using their POS tags. The results for Polish language modelling are presented.
Download

Paper Nr:	26
Title:	APPROXIMATION OF 5-LIMIT JUST INTONATION - Computer MIDI Modeling in Negative Systems of Equal Divisions of the Octave
Authors:	Mykhaylo Khramov
Abstract:	The article matter is related with music processing by MIDI protocol during computer modeling of fixed scales with non-traditional equal temperaments. Are touched negative temperaments, which based on closed series of fifths, compressed relative conventional tuning. Is marked, that such systems, can better approach to just intonation. They give sensation out of tuning during listening to music performed by scores with mistaken using of accidentals, which inaccessible in a conventional temperament system. Is given a subprogram example of the automatic Pitch Bend change in MIDI protocol for modeling of negative system of equal divisions of the octave.
Download

Paper Nr:	40
Title:	EMOTION ASSESSMENT TOOL FOR HUMAN-MACHINE INTERFACES - Using EEG Data and Multimedia Stimuli Towards Emotion Classification
Authors:	Jorge Teixeira, Vasco Vinhas, Luís Paulo Reis and Eugénio Oliveira
Abstract:	The identification and assessment of human being emotional states belongs to one of the primordial objectives of the scientific research in disparate areas such as artificial intelligence, medicine or psychology. The main objective of this project is related to automatic assessment of a subject’s basic emotional states by using electroencephalography as a source for biometric data acquisition. This evaluation is based on predefined mechanisms of emotional induction, as well as specific methods and tools capable of data analysis and processing. From the experimental results attained in several experimental sessions and through the support tools developed, the most pertinent conclusion extracted from this work refers to the capability of effectively performing automatic classification of the subject’s predominant emotional state. The emotional conditions were induced through the presentation of specific visual multimedia contents. The success rate of this tool, compared against the self assessment interviews carried out immediately after the experimental session, was approximately 75%. It was also experimentally concluded that female subjects are emotionally more demonstrative than the male ones.
Download

Paper Nr:	71
Title:	NEW TIME-FREQUENCY VOWEL QUANTIZATION ENHANCED BY SUBBAND HIERARCHY
Authors:	fraihat salam and Hervé Glotin
Abstract:	Speech dynamics may not well be addressed by the conventional speech processing. We analyse here a new quantization paradigm for vowel coding. It is based on simple Allen temporal interval algebra applied on subband voicing levels, yielding to a compressed speech representation of only 21 integers for a speech window up to 32 ms long. Experiments show that we take advantage of the ranking of the average values of the voicing interval accross the various subbands. Theses new features are evaluated for vowel recognition (1 hour, 6 vowels) on a referenced multispeaker radio broadcast news used during evaluation campaign ESTER. We work on the subset of the most frequent french vowels. We get 62% class error rate adding the ranking information to the Allen’s relations, instead of 70% using Allen relations alone, and 57% the set of the raw 48 floats. We then discuss on the advantage of using more subbands, and we finaly propose a strategy to tackle the combinatorial complexity of Allen relations.
Download

Paper Nr:	77
Title:	DECORRELATION TECHNIQUES IN IMAGE RESTORATION
Authors:	Catalina Cocianu, Luminita State, Panayiotis Vlamos and Doru Constantin
Abstract:	The restoration can be viewed as a process that attempts to reconstruct or recover an image that has been degraded by using some a priori knowledge about the degradation phenomenon. The multiresolution support provides a suitable framework for noise filtering and image restoration by noise suppression. We present the algorithms GMNR, a generalization of the MNR algorithm based on the multiresolution support set for noise removal in case of arbitrary mean, and NFPCA. A comparative analysis of the performance of the algorithms GNMR and NFPCA is experimentally performed against the standard AMVR and MMSE.
Download

Paper Nr:	106
Title:	PREDICTING BLOCKING EFFECTS IN THE SPATIAL DOMAIN USING A LEARNING APPROACH
Authors:	aladine chetouani, Ghiles Mostafaoui and Azeddine Beghdadi
Abstract:	A new method for predicting blocking effect in the spatial domain is proposed. This method aims at estimating the appearance of blocking artefacts in the original image prior to compression for a given bit rate and a given compression technique. The basic idea is to use a training process in order to compute a visibility measure. A weighting function of the blocking effects is then derived from this training process performed on a database. The proposed method is objectively and subjectively evaluated on various actual images. The obtained results confirm the efficiency of the proposed method in predicting blocking effect.
Download

Paper Nr:	125
Title:	GLOTTAL SOURCE ESTIMATION ROBUSTNESS - A Comparison of Sensitivity of Voice Source Estimation Techniques
Authors:	Thomas Drugman, Thomas Dubuisson, Alexis Moinet, Nicolas D’Alessandro and Thierry Dutoit
Abstract:	This paper addresses the problem of estimating the voice source directly from speech waveforms. A novel principle based on Anticausality Dominated Regions (ACDR) is used to estimate the glottal open phase. This technique is compared to two other state-of-the-art well-known methods, namely the Zeros of the Z-Transform (ZZT) and the Iterative Adaptive Inverse Filtering (IAIF) algorithms. Decomposition quality is assessed on synthetic signals through two objective measures: the spectral distortion and a glottal formant determination rate. Technique robustness is tested by analyzing the influence of noise and Glottal Closure Instant (GCI) location errors. Besides impacts of the fundamental frequency and the first formant on the performance are evaluated. Our proposed approach shows significant improvement in robustness, which could be of a great interest when decomposing real speech.
Download

Paper Nr:	144
Title:	MODELING OF REAL TIME VIDEO COMPRESSION SYSTEM - Three-dimensional Discrete Cosine Transform
Authors:	Tomas Fryza
Abstract:	One of the methods used for the video signals’ compression is the Three Dimensional Discrete Cosine Transform. The aim of this block-based method is to combine intraframe and interframe coding into a single transform coding, therefore no motion compensation and motion prediction have to be implemented. The paper deals with the practical ways of the 3-D DCT computing. It will be proof, the transform coding could be used for encoding of video sequences in real time domain.
Download

Paper Nr:	146
Title:	H.264/SVC ROI ENCODING WITH SPATIAL SCALABILITY
Authors:	Lino Ferreira, Luís Cruz and Pedro Assunção
Abstract:	This paper proposes two H.264/AVC compliant methods for encoding Regions-of-Interest (ROI) with spatial scalability and evaluates their respective rate-distortion-complexity performance. The base layer is kept unchanged and provides lower resolution images with roughly constant quality, without identification of the ROI. In the proposed methods there is no need to encode contour information because the ROI is implicitly defined in the upper layer of the spatial resolution in a transparent way by using different encoding parameters for the ROI and its complementary region. It is shown, that spatial scalability in ROI can be efficiently used to enhance specific regions of an image sequence in both spatial resolution and quality with low coding complexity. The proposed encoding scheme is suitable for remote surveillance, medical applications and entertainment, where higher resolution and higher quality ROI is a useful functionality for object/face recognition, selective encryption, detail analysis, etc.
Download

Paper Nr:	177
Title:	BIOMETRIC ACREDITATION ENTITIES - An Approach for Web Acreditation Services
Authors:	Belen Ruiz-Mezcua, Luis Puente, Diego Carrero and María Jesús Poza
Abstract:	Identity verification is nowadays a crucial task for security applications. In the near future organizations dedicated to store individual biometric information will emerge in order to determine individual identity. Biometric authentication is currently information intensive. The volume and diversity of new data sources challenge current database technologies. Biometric identity heterogeneity arises when different data sources interoperate. New promising application fields such as the Semantic Web and Semantic Web Services can leverage the potential of biometric identity, even though heterogeneity continues rising. Semantic Web Services provide a platform to integrate the lattice of biometric identity data widely distributed both across the Internet and within individual organizations. In this paper, we present a framework for solving biometric identity heterogeneity based on Semantic Web Services. We use a multimodal fusion recognition scenario as a test-bed for evaluation.
Download

Area 3 - Multimedia Systems and Applications

Full Papers

Paper Nr:	19
Title:	ROTATION INVARIANT FEATURE EXTRACTION FOR WATERMARKING
Authors:	M. Scagliola and Pietro Guccione
Abstract:	Many watermarks for still images are robust against common signal processing techniques, mainly JPEG compression, noise adding and low-pass filtering, while they are sensitive to geometrical manipulations, that yield desynchronization errors. In this paper robustness against some geometric transformations is achieved using a feature extraction method based on the Radon transform and whose aim is to identify an unique (and robust) feature from the image spectrum. The embedding, which exploits the extracted feature, is based on a multiplicative rule technique and is applied on a suitable subset of the image Fourier transform. The properties of the extracted feature allows to resynchronize the detector and the embedded watermark even if the image undergoes geometric manipulations (in particular rotations) as well as other processings, so that the correct watermark retrieving is guaranteed. Experimental results, lead on many standard images, confirm the effectiveness of the feature extraction method and the robustness of the watermark against both processing and geometric transformations.
Download

Paper Nr:	34
Title:	A PROTOTYPE FOR PRACTICAL EYE-GAZE CORRECTED VIDEO CHAT ON GRAPHICS HARDWARE
Authors:	Maarten Dumont, Steven Maesen, Sammy Rogmans and Philippe Bekaert
Abstract:	We present a fully functional prototype to convincingly restore eye contact between two video chat participants, with a minimal amount of constraints. The proposed six-fold camera setup is easily integrated into the monitor frame, and is used to interpolate an image as if its virtual camera captured the image through a transparent screen. The peer user has a large freedom of movement, resulting in system specifications that enable genuine practical usage. Our software framework thereby harnesses the powerful computational resources inside graphics hardware, to achieve real-time performance up to 30 frames per second for 800 × 600 resolution images. Furthermore, an optimal set of finetuned parameters are presented, that optimizes the end-to-end performance of the application, and therefore is still able to achieve high subjective visual quality.
Download

Paper Nr:	35
Title:	ESTIMATING H.264/AVC VIDEO PSNR WITHOUT REFERENCE - Using the Artificial Neural Network Approach
Authors:	Martin Slanina and Václav Říčný
Abstract:	This paper presents a method capable of estimating peak signal-to-noise ratios (PSNR) of digital video sequences compressed using the H.264/AVC algorithm. The idea is in replacing a full reference metric - the PSNR (for whose evaluation we need the original as well as the processed video data) - with a no reference metric, operating on the encoded bit stream only. As we are working just with the encoded bit stream, we can spare a significant amount of computations needed to decode the video pixel values. In this paper, we describe the network inputs and network configurations, suitable to estimate PSNR in intra and inter predicted pictures. Finally, we make a simple evaluation of the proposed algorithm, having the correlation coefficient of the real and estimated PSNRs as the measure of optimality.
Download

Paper Nr:	37
Title:	METHOD OF INTER-WORKING BETWEEN IMS AND NON-IMS (GOOGLE TALK) NETWORKS FOR MULTIMEDIA SERVICES
Authors:	Zhongwen Zhu and Richard Brunner
Abstract:	With the evolution of third generation network, more and more multimedia services are developed and deployed. Any new service to be deployed in IMS network is required to inter-work with existing Internet communities or legacy terminal users in order to appreciate the end users, who are the main drivers for the service to succeed. The challenge for Inter-working between IMS (IP Multimedia Subsystem) and non-IMS network is “how to handle recipient’s address”. This is because each network has its own routable address schema. For instance, the address for Google Talk user is xmpp:xyz@google.com, which is un-routable in IMS network. Hereafter a new Inter-working (IW) solution between IMS and non-IMS network is proposed for multimedia services that include Instant Messaging, Chat, and File transfer, etc. It is an end-to-end solution built on IMS infrastructure. The Public Service Identity (PSI) defined in 3GPP standard (3rd Generation Partnership Project) is used to allow terminal clients to allocate this IW service. When sending the SIP (Session Initial Protocol) request out for multimedia services, the terminal includes the recipient’s address in the payload instead of the “Request-URI” header. In the network, the proposed solution provides the mapping rules between different networks in MM-IW. The detailed technical description and the corresponding use cases are present. The comparison with other alternatives is made. The benefits of the proposed solution are highlighted.
Download

Paper Nr:	124
Title:	REVERBERATION ASSESSMENT IN AUDIOBAND SPEECH SIGNALS FOR TELEPRESENCE SYSTEMS
Authors:	Amaro Lima, Fabio Freeland, Paulo Esquef, Luiz Biscainho, Bruno Bispo, Rafael de Jesus, Sergio Netto, Ron Schafer, Amir Said, Bowon Lee and Ton Kalker
Abstract:	Modern telepresence systems constitute a new challenge for quality assessment of multimedia signals. This paper focuses on the evaluation of the reverberation impairment for audioband speech signals. A review on the reverberation effect is presented, with emphasis given on the mathematical modeling of its components, including early reflections and late reverberation. A subjective test for evaluating the human perception of the reverberation phenomenon is completely described, from its conception to the final results. Analyses are provided comparing the average subjective grades to current quality-evaluation standards for speech and audio signals. It is observed that the PESQ and PEAQ objective algorithms constitute interesting starting points for developing an objective method for measuring the reverberation effect on speech signals.
Download

Paper Nr:	141
Title:	STAFF LINE DETECTION AND REMOVAL WITH STABLE PATHS
Authors:	Artur Capela, Ana Rebelo, Jaime S. Cardoso and Carlos Guedes
Abstract:	Many music works produced in the past are currently available only as original manuscripts or as photocopies. Preserving them entails their digitalization and consequent accessibility in a machine-readable format, which encourages browsing, retrieval, search and analysis while providing a generalized access to the digital material. Carrying this task manually is very time consuming and error prone. While optical music recognition (OMR) systems usually perform well on printed scores, the processing of handwritten music by computers remains below the expectations. One of the fundamental stages to carry out this task is the detection and subsequent removal of staff lines. In this paper we integrate a general-purpose, knowledge-free method for the automatic detection of staff lines based on stable paths, into a recently developed staff line removal toolkit. Lines affected by curvature, discontinuities, and inclination are robustly detected. We have also developed a staff removal algorithm adapting an existing line removal approach to use the stable path algorithm at the detection stage. Experimental results show that the proposed technique outperforms well-established algorithms. The developed algorithm will now be integrated in a web based system providing seamless access to browsing, retrieval, search and analysis of submitted scores.
Download

Paper Nr:	154
Title:	A RANDOM CONSTRAINED MOVIE VERSUS A RANDOM UNCONSTRAINED MOVIE APPLIED TO THE FUNCTIONAL VERIFICATION OF AN MPEG-4 DECODER DESIGN
Authors:	George S. Silveira, Karina R. G. da Silva and Elmar U. K. Melcher
Abstract:	The advent of the new VLSI technology and SoC design methodologies has brought about an explosive growth to the complexity of modern electronic circuits. One big problem in the hardware design verification is to find good stimuli to make functional verification. A MPEG-4 decoder design require movies in order to make the functional verification. A real movie applied alone is not enough to test all functionalities, a random movie is used as stimuli to implement functional verification and reach coverage. This paper presents a comparison between a random constrained movie generator called RandMovie versus the use of a Random Unconstrained Movie (RUM). It shows the benefits of using a random constrained movie in order to reach the specified functional coverage. With such a movie generator one is capable of generating good random constrained movies, increasing coverage and simulating all specified functionalities. A case study for an MPEG-4 decoder design has been used to demonstrate the effectiveness of this approach.
Download

Short Papers

Paper Nr:	31
Title:	DEVELOPMENT OF AN EYE GAZE INTERFACE SYSTEM AND IMPROVEMENT OF CURSOR CONTROL FUNCTION
Authors:	Tetsuya Yonezawa, Kohichi Ogata, Masashi Nishimura and Kohei Matsumoto
Abstract:	This paper introduces an eye gaze interface system for controlling a mouse cursor on the computer display. The system consists of a small video camera to capture an eye image and a computer to detect the eye gaze from the image and to calculate the position of the cursor to be displayed depending on the detected eye gaze. In order to develop an easy-to-use system, consideration of involuntary and voluntary eye blink is necessary for practical use. Improvement of the stability of eye gaze-controlled cursor movement is also important. In this paper, smooth cursor control using a moving average filter and detection of involuntary and voluntary eye blink are described. The experiments show the usefulness of the proposed methods for quick and stable mouse cursor control. In the experiment of cursor pointing accuracy, distances between the target and the cursor point are about 30 pixels in horizontal direction and 20 pixels in vertical direction.
Download

Paper Nr:	31
Title:	DEVELOPMENT OF AN EYE GAZE INTERFACE SYSTEM AND IMPROVEMENT OF CURSOR CONTROL FUNCTION
Authors:	Tetsuya Yonezawa, Kohichi Ogata, Masashi Nishimura and Kohei Matsumoto
Abstract:	This paper introduces an eye gaze interface system for controlling a mouse cursor on the computer display. The system consists of a small video camera to capture an eye image and a computer to detect the eye gaze from the image and to calculate the position of the cursor to be displayed depending on the detected eye gaze. In order to develop an easy-to-use system, consideration of involuntary and voluntary eye blink is necessary for practical use. Improvement of the stability of eye gaze-controlled cursor movement is also important. In this paper, smooth cursor control using a moving average filter and detection of involuntary and voluntary eye blink are described. The experiments show the usefulness of the proposed methods for quick and stable mouse cursor control. In the experiment of cursor pointing accuracy, distances between the target and the cursor point are about 30 pixels in horizontal direction and 20 pixels in vertical direction.
Download

Paper Nr:	45
Title:	A NEW VIDEO QUALITY PREDICTOR BASED ON DECODER PARAMETER EXTRACTION
Authors:	Andreas Rossholm and Benny Lövström
Abstract:	In the mobile communication area there is a demand for reference free perceptual quality measurements in video applications. In addition low complexity measurements are required. This paper proposes a method for prediction of a number of well known quality metrics, where the inputs to the predictors are readily available parameters at the decoder side of the communication channel. After an investigation of the dependencies between these parameters and between each parameter and the quality metrics, a set of parameters is chosen for the predictor. This predictor shows good results, especially for the PSNR and the PEVQ metrics.
Download

Paper Nr:	68
Title:	A SMART SURVEILLANCE SYSTEM FOR HUMAN FALL-DOWN DETECTION USING DUAL HETEROGENEOUS CAMERAS
Authors:	Shaou-Gang Miaou, Cheng-Yu Chien, Fu-Chiau Shih and Chia-Yuan Huang
Abstract:	We propose a new surveillance system that uses both omni-directional (OD) and Pan/Tilt/Zoom (PTZ) cameras with heterogeneous characteristics and a relatively simple image processing algorithm to achieve the goal of real time surveillance. The system is demonstrated for detecting the occurrence of human’s fall-down event. An OD camera has a 360º viewing angle. It is used here to replace the multiple traditional cameras having limited viewing angles in order to reduce the system cost. A PTZ camera is also used in the system to track the target of interest and verify the occurrence of the event. Various unique features obtained from OD images are used for fall down detection and a multi-classifier approach is used for better recognition performance. Experimental results show that the system is quite robust to sudden changes of walking paths and different directions of falling. During the tracking process, a moving target is captured and its representative coordinates is obtained based on the processing of continuous OD images. The coordinates of the target in the OD camera space will be converted to its corresponding three dimensional (3D) coordinates in a real-world space. This derived information is served as guidance for the automatic control of the PTZ camera to track the moving target as closely as it can. By combining the advantages of two heterogeneous types of cameras, our experimental results show that the proposed system can track the moving target well without the need of a complicated method, showing the feasibility and potential of the system.
Download

Paper Nr:	82
Title:	ANONYMOUS BUYER-SELLER WATERMARKING PROTOCOL WITH ADDITIVE HOMOMORPHISM
Authors:	Mina Deng, Li Weng and Bart Preneel
Abstract:	Buyer-seller watermarking protocols integrate multimedia watermarking and fingerprinting with cryptography, for copyright protection, piracy tracing, and privacy protection. We propose an efficient buyer-seller watermarking protocol based on dynamic group signatures and additive homomorphism, to provide all the required security properties, namely traceability, anonymity, unlinkability, dispute resolution, non-framing, and non-repudiation. Another distinct feature is the improvement of the protocol’s utility, such that the double watermark insertion mechanism is avoided; the final quality of the distributed content is improved; the communication expansion ratio and computation complexity are reduced, comparing with conventional schemes.
Download

Paper Nr:	110
Title:	ADAPTIVE REAL-TIME WATERMARKING USING BLOCK CLASSIFICATION FOR H.264 COMPRESSED DOMAIN
Authors:	Yin Zhang, Zengxiang Lu and Haiming Lu
Abstract:	Focusing on the problem that watermark can cause visible image distortions in some plain areas, an adaptive watermarking algorithm for H.264 is proposed. To embed watermark quickly, we directly operate in the DCT domain. For high imperceptibility, we classify the blocks based on Human Visual System (HVS). However, most current classification methods are not suitable for H.264 compressed domain because the DCT coefficients residue cannot reflect the texture activity accurately due to the employment of intra-prediction. A new block classification method is applied, in which we make restriction during the encoding process so that the 4×4 blocks can be classified into plain, edge and texture blocks according to the intra-prediction mode and quantized integer DCT coefficients in compressed domain. It is effective in block classification and realizes good adaptive performance for watermarking. Drift compensation is also accomplished in our watermarking algorithm. The experimental results demonstrate our watermarking method can achieve both large capacity and good image imperceptibility. Additionally, the method is simple and appropriate for real-time applications.
Download

Paper Nr:	126
Title:	BUILDING MODULAR SURVEILLANCE SYSTEMS BASED ON MULTIPLE SOURCES OF INFORMATION - Architecture and Requirements
Authors:	Daniel Durães, Luis F. Teixeira and Luis Corte-Real
Abstract:	Intelligent surveillance is becoming increasingly important for the enhanced protection of facilities such as airports and power stations from various types of threats. We propose a surveillance system architecture based on multiple sources of information to apply on large scale surveillance networks. The main contribution of this paper is the definition of the requirements for a flexible and scalable architecture that supports intelligent surveillance using, alongside video, different sources of information, such as audio or other sensors.
Download

Paper Nr:	135
Title:	AUTOMATIC SYSTEM FOR THE RECOGNITION OF AMOUNTS IN HANDWRITTEN CHEQUES
Authors:	Filipe Coelho, Luis F. Teixeira, Luis F. Teixeira and Jaime S. Cardoso
Abstract:	Until the rise of electronic means for direct debit, bank cheques have been used as the best form of payment, balancing security and ease of use. Its acceptance and generalized use are result of international agreements that define rules for filling and using it. The fast processing of payments and transactions through safer electronic methods has created the need to reduce its usage over the last years. But despite this progressive reduction, bank cheques still are and will continue to be used; therefore, there is the need to optimize processing mechanisms. The existing automatic cheque processing systems are proprietary and not adapted to the Portuguese language, which is crucial for the cheque analysis and recognition. A prototype of an automatic system for the recognition of the amount in Portuguese bank cheques has been implemented and is being used as a test platform for improved intelligent character recognition algorithms.
Download

Paper Nr:	161
Title:	2D HAND GESTURE RECOGNITION METHODS FOR INTERACTIVE BOARD GAME APPLICATIONS
Authors:	Athanasios Kalpakas, Konstantinos Stampoulis, Nikolaos Zikos and Stefanos Zaharos
Abstract:	The purpose of the current project is to demonstrate a complete interactive application capable of recognizing 2D hand gestures in order to interact with computer-based board games without the use of a special input devices, such as pointer, mouse or keyboard. A web camera is placed at the top of the platform and captures in real-time player’s hand gestures and then recognizes the position of his fingertip on the board. The user is able to choose a piece, select a destination spot and move a piece just by simply placing and moving his/her index finger onto the board. Therefore an interactive, compact platform was developed, containing a light-wood construction, a printed chess board and a conventional webcam in order to test the effectiveness of the system. The suggested interactive system is fully compatible with the latest software technologies, uses a custom GUI, real-time 2D hand gesture recognizer and earcons.
Download

Paper Nr:	166
Title:	ON THE NEED FOR INCENTIVES TO SUPPORT PERSONALIZATION SYSTEMS - Turning Users into Active Providers of Contents and Metadata
Authors:	Martin Lopez-Nores, José J. Pazos-Arias, Jorge Garcia-Duque, Yolanda Blanco-Fernandez, Alberto Gil-Solla and Manuel Ramos-Cabrer
Abstract:	Research in personalization systems has made enormous progress in the last few years. However, the phenomenon of information overload is taking the state of the art to a dead end, due to the lack of metadata to describe the growing number of available contents. In this position paper, we take a look at the problem and suggest a research roadmap to find a way out, working on the idea of providing incentives to the end users to become active providers of contents and metadata.
Download

Paper Nr:	179
Title:	A NEW ARCHITECTURE FOR A MULTIPLATFORM AUGMENTED REALITY SYSTEM
Authors:	Andriamasinoro Rahajaniaina and Jean-Pierre Jessel
Abstract:	In this paper we describe a new architecture for augmented reality (AR) multiplatform hardware device, which works in dynamic workspace environment with 3D virtual models. The users can interact with the virtual model using mouse, keyboard and stylus as interaction tools. The work plan is formed by ARToolkitPlus’ fudicial multimarker. For adding virtual objects, we propose a virtual menu inspired by the metaphor of forward and next buttons. The work plan is augmented by the virtual workspace. The user can choose his virtual workspace in dynamic way using this virtual menu and can choose a virtual object suited to the current workspace.
Download

Paper Nr:	188
Title:	ON NLMS ESTIMATION FOR VOIP PLAYOUT DELAY ALGORITHMS - Improving Delay Spike Detection
Authors:	Karen Miranda and Victor M. Ramos
Abstract:	Voice over IP (VoIP) applications are now very popular and widely used on the Internet. Such applications use receiver playout buffers to smooth delay variations so as to reconstruct the periodic form of the transmitted packets. Packets arriving after their scheduled playout time are considered late and are not played out. Playout delay control algorithms often operate by updating the playout delay between periods of silence. A recent class of playout control algorithms has received particular attention; this class of algorithms uses autoregressive measures on the network delay so as to estimate future packet delay values and adjust the playout delay accordingly. In this work, we compare two algorithms previously proposed that use such autoregressive approach; both playout algorithms use a normalized least-mean square (NLMS) adaptive predictor. The difference between both algoritms is that the second one is an extension of the first that adds delay spike detection. We demonstrate, by using Internet audio packet traces that, contrary on what was claimed, the algorithm that uses spike detection does not overperfom the first one. Finally, we propose an algorithm based on the original NLMS algorithm with delay spike detection that overperforms the previous two NLMS playout algorithms.
Download

Paper Nr:	32
Title:	SECURE AND ROBUST COPYRIGHT PROTECTION FOR H.264/AVC BASED ON SELECTED BLOCKS DCT
Authors:	Ait Saadi Karima, Bouridane Ahmed and H. Meraoubi
Abstract:	This paper proposes a new block based DCT selection and a robust video watermarking algorithm to hide copyright information in the compressed domain of the emerging video coding standard H.264/AVC. The watermark is first quantized and securely inserted. To achieve invisibility and robustness, the high entropy DCT 4x4 blocks within the macroblocks are selected to minimise the distortion caused by the embedded watermark and then scrambled using Linear Congruential Generator (LCG) technique. This approach leads to a good robustness by maintaining good visual quality of the watermarked sequences. The experimental results demonstrate the effectiveness of the algorithm against some attacks such as re-compression by the H.264 codec, transcoding and scaling.
Download

Paper Nr:	38
Title:	CAPTURING THE HUMAN ACTION SEMANTICS USING A QUERY-BY-EXAMPLE
Authors:	Anna Montesanto, Paola Baldassarri , A. F. Dragoni, G. Vallesi and Paolo Puliti
Abstract:	The paper describes a method for extracting human action semantics in video’s using queries-by-example. Here we consider the indexing and the matching problems of content-based human motion data retrieval. The query formulation is based on trajectories that may be easily built or extracted by following relevant points on a video, by a novice user too. The so realized trajectories contain high value of action semantics. The semantic schema is built by splitting a trajectory in time ordered sub-sequences that contain the features of extracted points. This kind of semantic representation allows reducing the search space dimensionality and, being human-oriented, allows a selective recognition of actions that are very similar among them. A neural network system analyzes the video semantic similarity, using a two-layer architecture of multilayer perceptrons, which is able to learn the semantic schema of the actions and to recognize them.
Download

Paper Nr:	39
Title:	ARTIFICIAL NEURAL NETWORKS BASED SYMBOLIC GESTURE INTERFACE
Authors:	C. Iacopino, Anna Montesanto, Paola Baldassarri , G. Vallesi and Paolo Puliti
Abstract:	The purpose of the developed system is the realization of a gesture recognizer, applied to a user interface. We tried to get fast and easy software for user, without leaving out reliability and using instruments available to common user: a PC and a webcam. The gesture detection is based on well-known artificial vision techniques, as the tracking algorithm by Lucas and Kanade. The paths, opportunely selected, are recognized by a double layered architecture of multilayer perceptrons. The realized system is efficiency and has a good robustness, paying attention to an adequate learning of gesture vocabulary both for the user and for system.
Download

Paper Nr:	78
Title:	A COMPARATIVE USABILITY EVALUATION OF TWO AUGMENTED REALITY LEARNING SCENARIOS
Authors:	Alexandru Balog and Costin Pribeanu
Abstract:	Augmented Reality (AR) systems are featuring novel interaction techniques which are mainly driven by the possibilities to manipulate specific real objects. The interaction components have to be tested with users as early as possible in the development cycle in order to avoid usability problems. This paper reports on a comparative analysis of the usability evaluation results for two AR-based learning scenarios. The purpose of the evaluation was twofold: (a) getting an early feedback from users on the first version of the software, and (b) comparing the usability of two learning scenarios developed onto the same AR platform. The comparison has been performed between both quantitative and qualitative measures collected during a summer school.
Download

Paper Nr:	101
Title:	MULTI-ATTRIBUTE DECISION MAKING FOR AFFECTIVE BI-MODAL INTERACTION IN MOBILE DEVICES
Authors:	Efthimios Alepis, Maria Virvou and Katerina Kabassi
Abstract:	This paper presents how multi attributes decision making is used for affective interaction in mobile devices. The system bases its inferences about users’ emotions on user input evidence from the keyboard and the microphone of the mobile device. The actual combination of evidence from these two modes of interaction has been performed based on an innovative inference mechanism for emotions and a multi-attribute decision making theory. The mechanism that integrates the inferences form the two modes has been based on the results of two empirical studies, with the participation of human experts and possible users of the system.
Download

Paper Nr:	132
Title:	A CONFIGURABLE LINUX FILE SYSTEM FOR MULTIMEDIA DATA
Authors:	Nicola Corriero, Vittoria Cozza, Eustrat Zhupa and Vito De Tullio
Abstract:	In MusicMeshFS the tree structure of the virtual Linux filesystem, extended and made configurable by the MusicMeshFS language, is adapted for storing and efficiently retrieving multimedia data. In the case of installing MusicMeshFS inside an embedded system equipped with WIFI card, the multimedia data sharing over an ad hoc mesh network can be achieved for free.
Download

Paper Nr:	132
Title:	A CONFIGURABLE LINUX FILE SYSTEM FOR MULTIMEDIA DATA
Authors:	Nicola Corriero, Vittoria Cozza, Eustrat Zhupa and Vito De Tullio
Abstract:	In MusicMeshFS the tree structure of the virtual Linux filesystem, extended and made configurable by the MusicMeshFS language, is adapted for storing and efficiently retrieving multimedia data. In the case of installing MusicMeshFS inside an embedded system equipped with WIFI card, the multimedia data sharing over an ad hoc mesh network can be achieved for free.
Download

Paper Nr:	143
Title:	PERFORMANCE CONSIDERATIONS ON ADMISSION CONTROL FOR MULTIMEDIA SERVICES
Authors:	Brikena Statovci-Halimi and Harmen R. van As
Abstract:	Admission control represents a convenient mechanism to provide high-quality communication by ensuring resource availability. This paper gives on overview on different measurement-based admission control algorithms suitable to be applied in multimedia service environments. A new estimator used for the measurement process is introduced, which dynamically changes the time window used for measurements. The performance metrics of interest within the performance analysis are made up of average utilization, packet loss and percentage of admitted flows.
Download

Paper Nr:	160
Title:	HUMAN SKIN COLOR DETECTION AND APPLICATION TO ADULT IMAGE DETECTION
Authors:	Ryszard S. Choras
Abstract:	In this paper, we aimed at the detection of adult image. The methods of detection mainly focus on the detection/identification of skin region. Skin detection is of the paramount importance in the detection of adult images. Our algorithm is designed to detect human skin color in Y Cb Cr color space. The proposed system finds skin regions and then generates the skin likelihood image. Since the skin likelihood image contains shape information as well as skin color information, we used the skin likelihood image to classify the adult images.
Download

Paper Nr:	170
Title:	SOFTWARE LIFE-CYCLE FOR AN ADAPTIVE GEOGRAPHICAL INFORMATION SYSTEM
Authors:	Katerina Kabassi, Maria Virvou, Eleni Charou and Aristotelis Martinis
Abstract:	This paper presents the software life-cycle for the development of a knowledge-based GIS. The life-cycle framework used is called MBIUI and provides the experimental studies that are required for designing, implementing and testing a decision making theory in a graphical user interface. The decision making theory has been adapted in the user interface for is used for the evaluation of different environmental data in terms of some criteria that concern the user needs and skills and select the one that seems most suitable for a user.
Download

Paper Nr:	191
Title:	SUBJECTIVE VERIFICATION OF PERCEPTUAL METRICS FOR IMAGE WATERMARKING FIDELITY
Authors:	Franco Alberto Del Colle and Juan Carlos Gómez
Abstract:	In this paper, the performance of several state-of-the-art watermark perceptual transparency metrics is evaluated through subjective assessment. Simulation results show that a metric based on S-CIELAB distortion maps proved to be better correlated to the subjective tests than other objective metrics available in the literature. The paper focus on Image Adaptive Watermarking methods in the Discrete Wavelet Transform Domain since they yield better results regarding robustness and transparency than other watermarking schemes.
Download