SIGMAP 2007 Abstracts


Full Papers
Paper Nr: 111
Title:

A ROBUST NON-LINEAR FACE DETECTOR

Authors:

Antonio Rama, Francesc Tarres and Jacek Naruniec

Abstract: A novel face detector using the non-linear Fuzzy Integral operator is presented in this paper. The main advantage of this method is that it has a much lower false detection rate with the same optimal set of features as the state-of-the art Adaboost face detector. Furthermore, this novel face detector seems to have a better generalization capability than the Adaboost method. Preliminary results show a positive face detection rate higher than the 92% having a false detection rate lower than the 2% when using a four stage cascade scheme.
Download

Paper Nr: 140
Title:

SPEECH SEGMENTATION IN NOISY STREET ENVIRONMENT

Authors:

Jaroslaw Baszun

Abstract: Two voice activity detectors for speaker verification systems were compared in this paper. The first one is single-microphone system based on properties of human speech modulation spectrum i.e. rate of power distribution in modulation frequency domain. Based on the fact that power of modulation components of speech is concentrated in a range from 1 to 16 Hz and depends on rate of syllables uttering by a person. Second one is two-microphone system with algorithm based on coherence computation. Experiments shown superiority of two-microphone system in case of voiced sounds in background.
Download

Paper Nr: 142
Title:

ARCHITECTURE OF INFORMATION SYSTEM FOR INTELLIGENT CASH MACHINE

Authors:

Krystian Ignasiak, Marcin Morgoś and Surachai Ongkittikul

Abstract: The paper summarizes the idea of the intelligent cash machine simulator that can be a very flexible environment to test new algorithms in image processing. The use cases for such a cash machine are discussed. Some data workflow is presented, and finally, an architecture is proposed for integration of different software modules without imposing constraints on software platforms and tools. The proposal is based on XML as a common language to exchange data.
Download

Paper Nr: 164
Title:

SEMANTIC MEDIA ANALYSIS FOR PARALLEL HIDING OF DATA IN VIDEO AND AUDIO TRACK

Authors:

Stanislaw Badura and Slawomir Rymaszewski

Abstract: This paper is dealing with the role of steganography in system of intelligent cash machines. Hiding methods dedicated especially for this application are presented. In the system typical content protection was extended by hiding multistream secret additional information. Video files with concealed information are stored in the system by monitoring subsystem (Controll AV recording and storage) and third part has no possibility to delete, add or change video archives. Two different steganography methods were mixed to prepare more advanced approach.
Download

Paper Nr: 166
Title:

FACE VERIFICATION IN UNCONTROLLED LIGHT CONDITIONS OF STREET

Authors:

Mariusz Leszczynski and Wladyslaw Skarbek

Abstract: Impact of light conditions on face verification are considered for three linear discriminant feature extraction schemes. Two verification scenarios, the single image query and multi image query, were compared. The extraction algorithms are based on compositions of feature projections on global, intra and inter-class error subspaces: Linear Discriminant Analysis LDA, Dual Linear Discriminant Analysis DLDA, and their combination LDA+DLDA. The metrics for evaluation of the verification error is the Mahalanobis distance between normalized feature vectors. The normalization of feature vectors is justified with the upper bound by Fisher separation index for feature vectors. Experiments conducted on facial databases with complex background show the high performance of DLDA and DLDA+LDA verifiers with Equal Error Rate EER less than one percent. The degradation of results, when controlled light conditions are replaced by uncontrolled ones, is of factor two.
Download

Paper Nr: 175
Title:

FACE DETECTION AND TRACKING IN DYNAMIC BACKGROUND OF STREET

Authors:

Jacek Naruniec, Wladyslaw Skarbek and Antonio Rama

Abstract: The paper presents a novel face detection and tracking algorithm which could be part of human-machine interaction in applications such as intelligent cash machine. The facial feature extraction algorithm is based on discrete approximation of Gabor Transform, called Discrete Gabor Jets (DGJ), evaluated in edge points. DGJ is computed using integral image for fast summations in arbitrary windows and by FFT operations on short contrast signals. Contrasting is performed along radial directions while frequency analysis along angular directions. Fourier coefficients for a small number of rings create a feature vector which is next reduced to few LDA components and then compared to the reference facial feature vector. Detected eyes and nose corners are chosen to fit reference face by spatial relationships. Tracking is based on the same rule, but the corners are searched only within already detected facial features neighborhood. Optionally for face normalization eyes centers are found as centers of outer and inner eye corners. Comparison of manual and automatic eye center detection shows still significant advantage of manual approach, measured in terms of accuracy in face recognition by Linear Discriminant Analysis (LDA) and Dual Linear Discriminant Analysis (DLDA) algorithms.
Download

Paper Nr: 188
Title:

REMOTE RENDERING OF COMPUTER GAMES

Authors:

Peter Eisert and Philipp Fechteler

Abstract: In this paper, we present two techniques for streaming the output of computer games to an end device for remote gaming in a local area network. We exploit the streaming methods in the European project Games@Large, which aims at creating a networked game platform for home and hotel environments. A local PC based server executes a computer game and streams the graphical and audio output to local devices in the rooms, such that the users can play everywhere in the network. Dependent on the target resolution of the end device, different types of streaming are addressed. For small displays, the graphical output is captured and encoded as a video stream. For high resolution devices, the graphics commands of the game are captured, encoded, and streamed to the client. Since games require significant feedback from the user, special care has to be taken to achieve these constraints for very low delays.
Download

Paper Nr: 189
Title:

A HUMAN ACTION CLASSIFIER FROM 4-D DATA (3-D+TIME) - Based on an Invariant Body Shape Descriptor and Hidden Markov Models

Authors:

Massimiliano Pierobon, Marco Marcon, Augusto Sarti and Stefano Tubaro

Abstract: Many human action definitions have been provided in the field of human computer interaction studies. These distinctions could be considered merely semantical as human actions are all carried out performing sequences of body postures. In this paper we propose a human action classifier based on volumetric reconstructed sequences (4-D data) acquired from a multi-viewpoint camera system. In order to design the most general action classifier possible, we concentrate our attention in extracting only posture-dependent information from volumetric frames and in performing action distinction only on the basis of the sequence of body postures carried out in the scene. An Invariant Shape Descriptor (ISD) is used in order to properly describe the body shape and its dynamic changes during an action execution. The ISD data is then analyzed in order to extract suitable features able to meaningfully represent a human action independently from body position, orientation, size and proportions. The action classification is performed using a supervised recognizer based on the Hidden Markov Models (HMM) theory. Experimental results, evaluated using an extensive action sequence dataset and applying different training conditions to the HMM-based classifier, confirm the reliability of the proposed approach.
Download

Area 1 - Multimedia Communications

Full Papers
Paper Nr: 108
Title:

IMS SECURED CONTENT DELIVERY OVER PEER-TO-PEER NETWORKS

Authors:

Jens Fiedler, Thomas Magedanz and Alejandro Menendez

Abstract: Effective content distribution, which is safe against denial of service attacks, is one of the greatest challenges for the content and service providers. Peer-to-peer technologies are known to be unaffected by such attacks, but lack any control by content owners or copyright holders. The work presented in this Paper combines the effective and reliable content availability, known from P2P, with the capabilities of IMS, which is used for access control, charging and service discovery. Commercial use cases are discussed for content consumption and provisioning.
Download

Short Papers
Paper Nr: 42
Title:

A METHODOLOGY FOR THE DEPLOYMENT OF LIVE AUDIO AND VIDEO SERVICES

Authors:

David Melendi, Xabiel Garcia, Manuel Vilas, Roberto García and Victor Garcia

Abstract: Since the development of the first live audio and video services in the 90s, the deployment of these services has always been a challenging issue. Not only is it necessary to deal with the problems of the delivery of continuous information and the high consumption of resources, but also with those imposed by the nature of these services. Service managers do not get a second chance to broadcast live contents so it is important to ensure that everything works as planned. Most service managers only work based on their own experience, but they rarely follow any standardized method. With the aim of improving the current situation, the authors have designed a methodology for the deployment of live audio and video which is presented in this paper. The methodology tries to cover almost all the issues that may arise while putting one of these services into operation and proposes mechanisms to deal with those issues from a management perspective. It has been successfully used by the authors in the deployment of several live services for different companies.
Download

Paper Nr: 71
Title:

PERFORMANCE OF AUDIO/VIDEO SERVICES ON CONSTRAINED VARIABLE USER ACCESS LINES

Authors:

Manuel Vilas, X. G. Pañeda, David Melendi, Roberto García and Victor Garcia

Abstract: Nowadays, it is more and more common for the same access line to be shared among different services and even among different users. This change in home users’ behaviour, that has given rise to resource consumption close to the maximum available in user access lines, is mainly due to the increase in subscriber access capabilities which have taken place in the last few years. At the same time, contracts fulfilled by customers and network operators only provide guarantees for a reduced percentage of the maximum download/upload capacity of the line. In this paper, a study of the effects on streaming services caused by variations on the access line and by the traffic of other services is carried out. One of the main conclusions of the paper is that the delivery rate of UDP streaming sessions is mainly guided by the quality of the contents and does not consider the congestion in the network. For this reason, a method for delivery rate estimation for UDP streaming sessions is presented.
Download

Paper Nr: 145
Title:

LOW COMPLEXITY, LOW DELAY AND SCALABLE AUDIO CODING SCHEME BASED ON A NOVEL STATISTICAL PERCEPTUAL QUANTIZATION PROCEDURE

Authors:

César Alonso Abad, Miguel Ángel Martín Fernández and Carlos Alberola López

Abstract: In this paper we present Fast Perceptual Quantization (FPQ), a novel procedure to quantize and code audio signals. It employs the same psychoacoustics principles used in the popular MPEG/Audio coders, but substantially simplifies the complexity and computational needs of the encoding process. FPQ is based on defining a hierarchy of privileged quantization values so that the masking threshold calculated through a psychoacoustic model is leveraged to quantize the real values to the privileged ones when possible. The computational cost of this process is very low compared to MP3’s or AAC’s quantization/coding loops. Experimental results show that it is possible to achieve nearly transparent coding using as few as approximately 100 quantization values. This leads to very efficient bit compaction using Huffman or arithmetic coding so that nearly state-of-the-art performance can be achieved in terms of quality/bit-rate trade-off. Since quantization and codification (bit compaction) procedures are completely independent here, efficient scalable decoding can be achieved either by parsing and entropy re-encoding the original quantized values or by coding the bit-planes independently and sorting them in order of perceptual significance. Very low delay performance is also possible to achieve, which makes the proposed coding scheme suitable for real-time applications.
Download

Area 2 - Multimedia Signal Processing

Full Papers
Paper Nr: 11
Title:

DIFFUSE MATRIX - An Optimized Data Structure for the Storage and Processing of Hyperspectral Images

Authors:

José Manuel Chaves-González, Miguel A. Vega-Rodríguez, Pablo J. Martínez-Cobo, Juan A. Gómez-Pulido and Juan Manuel Sánchez-Pérez

Abstract: This paper proposes a new format for storing and processing hyperspectral images captured by spectrometer AVIRIS (Airborne Visible/InfraRed Imaging Spectrometer). Obtaining such images is difficult, because the sensor that takes the images is carried in an aircraft that suffers turbulences while the camera is taking photos. So, a geo-rectification process is necessary to correct the information of different bands. The format proposed in this paper, DMF (Diffuse Matrix Format), allows a more efficient storage, because a list with the original information received in the sensor is saved for each position (X,Y) of the scanned ground. The format of the list saves space and time because no redundant information is saved using it. To show the possibilities of this new format an application that makes some thresholding and filter operations has been built. This program, firstly, creates the diffuse matrix in memory from the file that stores the image information, and then, some filter operations are executed over the diffuse matrix to check it. In this way, we prove that diffuse matrix processing is fast and simple, as well as the space used in the disk for its storage is quite less than the space used by typical formats.
Download

Paper Nr: 34
Title:

FACIAL EXPRESSION SYNTHESIS AND RECOGNITION WITH INTENSITY ALIGNMENT

Authors:

Hao Wang

Abstract: This paper proposes a novel approach for facial expression synthesis that can generate arbitrary expressions for a new person with natural expression details. This approach is based on local geometry preserving between the input face image and the target expression image. In order to generate expressions with arbitrary intensity for a new person with unknown expression, this paper also develops an expression recognition scheme based on Supervised Locality Preserving Projections (SLPP), which aligns different subjects and different intensities on one generalized expression manifold. Experimental results clearly demonstrate the efficiency of the proposed algorithm.
Download

Paper Nr: 55
Title:

A NEURAL NETWORK-BASED SYSTEM FOR FACE DETECTION IN LOW QUALITY WEB CAMERA IMAGES

Authors:

Ioanna-Ourania Stathopoulou and George A. Tsihrintzis

Abstract: The rapid and successful detection and localization of human faces in images is a prerequisite to a fully automated face image analysis system. In this paper, we present a neural network–based face detection system which arises from the outcome of a comparative study of two neural network models of different architecture and complexity. The fundamental difference in the construction of the two models lies in approaching the face detection problem either by seeking a general solution based on the full-face image or by composing the solution through the resolution of specific portions/characteristics of the face. The proposed system is based on the brightness contrasts between specific regions of the human face. We show that the second approach, even though more complicated, exhibits better performance in terms of detection and false-positive rates. We tested our system with low quality face images acquired with web cameras. The image test set includes both front and side view images of faces forming either a neutral or one of the “smile”, “surprise”, “disgust”, “scream”, “bored-sleepy”, “angry”, and “sad” expressions. The system achieved high face detection rates, regardless of facial expression or face view.
Download

Paper Nr: 79
Title:

MACROBLOCK SKIPPING ALGORITHMS FOR HIGH DEFINITION H.264/AVC VIDEO CODING IN THE BASELINE PROFILE

Authors:

Susanna Spinsante, Ennio Gambi and Damiano Falcone

Abstract: This paper discusses different macroblock skipping algorithms to be applied in the H.264/AVC Baseline profile, in order to facilitate the adoption of High Definition video coding in real time applications. Moving from Standard to High Definition video coding, there is six times as much data to process: this motivates the search for suited Mode Decision strategies, to reduce complexity while preserving an acceptable video quality for the final user. The proposed schemes permit to speed up significantly the Mode Decision procedure, by forcing the selection of the SKIP mode over each frame, without affecting significantly the final quality.
Download

Paper Nr: 85
Title:

SMOOTHED REFERENCE PREDICTION FOR IMPROVING SINGLE-LOOP DECODING PERFORMANCE OF H.264/AVC SCALABLE EXTENSION

Authors:

So Young Kim and Woo Jin Han

Abstract: It is well-known that multi-layer extension of H.264/AVC shows good spatial scalability performance mainly due to its efficient inter-layer prediction techniques. Although single-loop decoding is a kind of technique to reduce the decoder-side computational complexity by performing only one motion compensation to decode multi-layer data, its limited use of inter-layer prediction sometimes degrades the performance especially for fast-motion sequences. In this paper, smoothed reference prediction technique is proposed to improve the single-loop decoding performance by replacing base-layer information with current-layer information and simple block-based smoothing function. Experimental results show that the proposed method can improve the coding efficiency with all benefits of single-loop decoding mode. In addition, the proposed method was adopted to scalable extension of H.264/AVC standard Working Draft.
Download

Paper Nr: 88
Title:

HIGHER-ORDER STATISTICS INTERPRETATION. APPLICATION TO POWER-QUALITY CHARACTERIZATION

Authors:

Juan-Jose Gonzalez De La Rosa, África Luque, Carlos G. Puntonet, J. M. Górriz and Antonio Moreno-Munoz

Abstract: In this paper we perform a practical review on higher-order statistics interpretation. Concretely we focuss on an unbiased estimate of the 4th-order time-domain cumulants. Some synthetics involving classical noise processes are characterized using this unbiased estimate, with the goal of checking its performance and to provide the scientific community with another result, dealing with the interpretation of this signal processing tool. A real-life practical example is presented in the field of electrical power quality event analysis. The work also aims to present a set of general advice in order to save memory and gain speed in a real signal processing frame, dealing with non-stationary processes.
Download

Paper Nr: 95
Title:

USING 3D FEATURES TO EVALUATE CORK QUALITY

Authors:

Beatriz Paniagua-paniagua, Miguel A. Vega-Rodríguez, Hiroshi Nagahashi, Juan A. Gómez-Pulido and Juan Manuel Sánchez-Pérez

Abstract: In this paper we study different 3D features in cork material. We do this in order to solve a classification problem existing in the cork industry: the cork stopper/disk quality classification. Cork Quality Standard sets seven different cork quality classes for cork stopper classification. These classes are based on a complex combination of cork stopper defects. In previous studies we only analysed those features that could be detected/acquired with a 2D camera. In this study we work in a 3D environment, in order to extract those features that we could not be extracted in a 2D approach. As a conclusion we can say that the most important 3D cork quality detection feature takes into account dark and deep cork areas (usually, these areas indicate deep and important defects). Furthermore, the 3D features have widely improved the results obtained by similar features with a 2D approach, due to the 3D approach includes more information. This fact allows us to extract more complex features, as well as improve the classification results.
Download

Paper Nr: 116
Title:

A NEW ADAPTIVE CLASSIFICATION SCHEME BASED ON SKELETON INFORMATION

Authors:

Catalina Cocianu, Luminita State, Ion Rosca and Panayiotis Vlamos

Abstract: Large multivariate data sets can prove difficult to comprehend, and hardly allow the observer to figure out the pattern structures, relationships and trends existing in samples and justifies the efforts of finding suitable methods from extracting relevant information from data. In our approach, we consider a probabilistic class model where each class h ∈ H is represented by a probability density function defined on R n ; where n is the dimension of input data and H stands for a given finite set of classes. The classes are learned by the algorithm using the information contained by samples randomly generated from them. The learning process is based on the set of class skeletons, where the class skeleton is represented by the principal axes estimated from data. Basically, for each new sample, the recognition algorithm classifies it in the class whose skeleton is the “nearest” to this example. For each new sample allotted to a class, the class characteristics are re-computed using a first order approximation technique. Experimentally derived conclusions concerning the performance of the new proposed method are reported in the final section of the paper.
Download

Paper Nr: 127
Title:

LOCAL DISSONANCE MINIMIZATION IN REALTIME

Authors:

Julián Villegas and Michael Cohen

Abstract: This article discusses the challenges of applying the tonotopic consonance theory to minimize the dissonance of concurrent sounds in real-time. It reviews previous solutions, proposes an alternative model, and presents a prototype programmed in Pd that aims to surmount the difficulties of prior solutions.
Download

Paper Nr: 133
Title:

RATE CONTROL FOR MULTI-SEQUENCE H.264/AVC COMPRESSION

Authors:

Andrzej Pietrasiewicz and Grzegorz Pastuszak

Abstract: Multi-sequence video coding allows bit-budget to be distributed among sequences. This paper presents the method of selection of a common quantization parameter, which is applied concurrently to each sequence. The approach takes into account ρ-domain rate-distortion models kept independently for each video sequence and builds a common model. The output buffer is verified jointly for all the sequences and drives a joint bit allocation process. The method has been verified in simulation to demonstrate its usefulness in video encoding.
Download

Paper Nr: 159
Title:

UNSUPERVISED NON PARAMETRIC DATA CLUSTERING BY MEANS OF BAYESIAN INFERENCE AND INFORMATION THEORY

Authors:

Gilles Bougeniere, Claude Cariou, Kacem Chehdi and Alan Gay

Abstract: In this communication, we propose a novel approach to perform the unsupervised and non parametric clustering of n-D data upon a Bayesian framework. The iterative approach developed is derived from the Classification Expectation-Maximization (CEM) algorithm, in which the parametric modelling of the mixture density is replaced by a non parametric modelling using local kernels, and the posterior probabilities account for the coherence of current clusters through the measure of class-conditional entropies. Applications of this method to synthetic and real data including multispectral images are presented. The classification issues are compared with other recent unsupervised approaches, and we show that our method reaches a more reliable estimation of the number of clusters while providing slightly better rates of correct classification in average.
Download

Paper Nr: 160
Title:

PHONETIC-BASED MAPPINGS IN VOICE-DRIVEN SOUND SYNTHESIS

Authors:

Jordi Janer and Esteban Maestre

Abstract: In voice-driven sound synthesis applications, phonetics convey musical information that might be related to the sound of an imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. Hence, a user-adapted system is proposed, where mappings depend on how subjects performs musical articulations given a set of examples. The system will consist of, first, a voice imitation segmentation module that automatically determines note-to-note transitions. Second, a classifier determines the type of musical articulation for each transition from a set of phonetic features. For validating our hypothesis, we run an experiment where a number of subjects imitated real instrument recordings with the voice. Instrument recordings consisted of short phrases of sax and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training classifier (user-dependent) are compared to a classifier based on heuristic rules (user- independent). Finally, with the previous results we improve the quality of a sample-concatenation synthesizer by selecting the most appropriate samples.
Download

Short Papers
Paper Nr: 16
Title:

FUSION PREDICTORS FOR DISCRETE-TIME LINEAR SYSTEMS WITH MULTISENSOR ENVIRONMENT

Authors:

Haryong Song and Shin Vladimir

Abstract: New fusion predictors for linear dynamic systems with different types of observations are proposed. The fusion predictors are formed by summing of the local Kalman filters/predictors with matrix weights depending only on time instants. The relationship between them and the optimal predictor is discussed. High accuracy and computational efficiency of the fusion predictors are demonstrated on the first-order Markov process and the damper harmonic oscillator motion with multisensor environment.
Download

Paper Nr: 27
Title:

SMALL TRICKS TO ENHANCE THE ACCURACY OF LICENSE PLATE CHARACTER RECOGNITION

Authors:

Balazs Enyedi, Lajos Konyha, Kalman Fazekas and Jan Turan

Abstract: License plate recognition solutions to date are numerous and quite diverse. It is a complex problem field that can clearly be separated into two areas: localizing the actual license plate number and recognizing individual characters. Current professional literature devotes relatively small attention to individual steps of character recognition, which is exacerbated by the fact that the vast majority of solutions result in severe data losses due to inconsiderate discarding of information that could significantly enhance the accuracy of the end result that is, improve recognition reliability. Certain letters and numbers are very easy to mistake for one another, and some solutions focus too heavily on attempting to differentiate between them, complicating the recognition algorithm and possibly unnecessarily increasing its computation requirements. Instead, retaining certain information can result in much faster and more accurate recognition algorithms. This paper describes tricks to enhance accuracy and presents the points of potential significant data losses during the recognition process. The solutions described here are applicable along with any recognition algorithm, enhancing its accuracy and reliability.
Download

Paper Nr: 137
Title:

UNSUPERVISED ALGORITHMS FOR SEGMENTATION AND CLUSTERING APPLIED TO SOCCER PLAYERS CLASSIFICATION

Authors:

Paolo Spagnolo, P. L. Mazzeo, Marco Leo and Tiziana D'Orazio

Abstract: In this work we consider the problem of soccer player detection and classification. The approach we propose starts from the monocular images acquired by a still camera. Firstly, players are detected by means of background subtraction. An algorithm based on pixels energy content has been implemented in order to detect moving objects. The use of energy information, combined with a temporal sliding window procedure, allows to be substantially independent from motion hypothesis. Then players are assigned to the correspondent team by means of an unsupervised clustering algorithm that works on colour histograms in RGB space. It is composed by two distinct modules: firstly, a modified version of the BSAS clustering algorithm builds the clusters for each class of objects. Then, at runtime, each player is classified by evaluating its distance, in the features space, from the classes previously detected. Algorithms have been tested on different real soccer match of the Italian Serie A.
Download

Paper Nr: 148
Title:

LIVE TV SUBTITLING - Fast 2-pass LVCSR System for Online Subtitling

Authors:

Ales Prazak, Ludek Muller, J.v. Psutka and Josef Psutka

Abstract: The paper describes a fast 2-pass large vocabulary continuous speech recognition (LVCSR) system for automatic online subtitling of live TV programs. The proposed system implementation can be used for direct recognition of TV program audio channel or recognition of a shadow speaker who re-speaks the original audio channel. The first part of this paper focuses on preparation of an adaptive language model for TV programs, where person names are specific for each subtitling session and have to be added to the recognition vocabulary. The second part outlines the recognition system conception for automatic online subtitling with vocabulary up to 150 000 words in real-time. The recognition system is based on Hidden Markov Models, lexical trees and bigram and quadgram language models in the first and second pass, respectively. Finally, experimental results from our project with the Czech Television are reported and discussed.
Download

Paper Nr: 165
Title:

IMPROVEMENT OF H.264 SKIP MODE

Authors:

Kyohyuk Lee, Woojin Han and Tammy Lee

Abstract: H.264 (MPEG-4 AVC) is the state of the art international video coding standard which shows better coding efficiency compared to previous standards. This contribution is on the improvement of motion derivation process of H.264 SKIP mode. H.264 exploits temporal or spatial motion field correlation to derive current motion field. Temporal or spatial direct mode macroblock for B slice and skip mode macroblock for P slice are adopted for exploitation of motion field correlation. In general, H.264 SKIP mode macroblock has great impact on coding efficiency because about 30 ~ 70% of macroblocks are set as skip mode. SKIP mode macroblock derives one motion vector for whole 16x16 macroblock region from spatial correlation. In this contribution, we improved SKIP mode motion field further instead of setting one motion vector for 16x16 macroblock region. We split 16x16 macroblock into four 8x8 sub-partitions and set each sub-partition SKIP mode motion field separately. Experimental results showed average 2.05% and up to 18.63% bit rate reduction, especially higher coding efficiency in low bit rate condition.
Download

Paper Nr: 53
Title:

FAST SOUND FILE CHARACTERISATION METHOD

Authors:

Lucille Tanquerel and Luigi Lancieri

Abstract: This article describes a fast technique of characterization of sound documents based on a statistical measure of the variation of the signal. We showed that a very limited sampling was sufficient to obtain a reasonable performance of the characteristic while being 100 times faster to calculate than a complete sampling. During preliminary tests, we carried out a first validation of our approach by highlighting a correlation of 0.7 between the human perception of the rhythm and our characteristic as well as an error of recognition lower than 5%. In this new series of tests, we show that our approach makes possible to associate to a cut file its missing half, with an error rate from approximately 30%.
Download

Paper Nr: 57
Title:

COMPARISON OF BACKGROUND SUBTRACTION METHODS FOR A MULTIMEDIA LEARNING SPACE

Authors:

Fida El Baf, Thierry Bouwmans and B. Vachon

Abstract: This article presents, at a first time, a multimedia application called Aqu@theque. This project consists in elaborating a multimedia system dedicated to aquariums which gives ludo-pedagogical information in an interactive learning area. The reliability of this application depends of the segmentation and recognition steps. Then, we focus on the segmentation step using the background subtraction principle. Our motivation is to compare different background subtraction methods used to detect fishes in video sequences and to improve the performance of this application. In this context, we present a new classification of the critical situations which occurred in videos and disturbed the assumptions made in background subtraction methods. This classification can be used in any application using background subtraction like video surveillance, motion capture or video games.
Download

Paper Nr: 65
Title:

EFFICIENT MOTION COMPENSATION ARCHITECTURE WITH RATE-DISTORTION OPTIMIZATION FOR H.264/AVC

Authors:

Song Tian and Shimamoto Takashi

Abstract: In this paper, a novel motion compensation architecture is proposed to support the Rate-Distortion Optimization(RDO) in H.264/AVC. First, the scope of the motion compensation in this work is defined not only including the half and quarter pixel motion compensation but also the deblocking filter and rate-distortion optimazation. Then, base on the new concept of motion compensation an efficient architecture for H.264/AVC codec is constructed. Proposed architecture could select the best mode for INTRA macroblocks using the lagrange function by calculating the distortion and the generated bits. It could also calculate the lagrange function for INTER macroblocks by receiving the motion vector information and the interpolation data from the ME(Motion Estimation) module to construct a complete rate distortion optimization architecture. Pipelined processing structure is designed for sub-block mode selection to achieve real-time processing for up to HDTV resolution inputs. Implementation result shows that proposed architecture could be realized with only 42,280 gates and 48,320 bits SRAM.
Download

Paper Nr: 76
Title:

EFFICIENT DIGITAL FREQUENCY DOWN CONVERTER STRUCTURE USING CIC FILTERS AND INTERPOLATED FOURTH-ORDER POLYNOMIALS

Authors:

Youngbeom Jang, Do-Han Kim and Won-Sang Lee

Abstract: In this paper, we propose an efficient digital frequency down converter (DFDC) structure using CIC (Cascaded Integrator-Comb) decimation filters and interpolated fourth-order polynomials (IFOP). Typical DFDC with high decimation factors consist of a CIC filter and a halfband filter. By inserting the proposed IFOP between the CIC and halfband filters, it is shown that passband droop and aliasing band attenuation characteristics are simultaneously improved. Since the IFOP requires only three multiplications, the proposed DFDC can be used in intermediate frequency blocks of the high-speed communication systems.
Download

Paper Nr: 87
Title:

SPECTRUM WEIGHTED HRTF BASED SOUND LOCALIZATION

Authors:

Sergio Cavaliere and Pietro Santangelo

Abstract: In the framework of humanoid robotics it’s of great importance studying and developing computational techniques that enrich robot perception and its interaction with the surrounding environment. The most important cues for the estimation of sound source azimuth are interaural phase differences (IPD), interaural time differences (ITD) and interaural level differences (ILD) between the binaural signals. In this paper we present a method for the recognition of the direction of a sound located on the azimuthal plane (i.e. the plane containing the interaural axis). The proposed method is based on a spectrum weighted comparison between ILD’s and IPD’s extracted from microphones located at the ears and a set of stored cues; these cues where previously measured and stored in a database in the form of a Data Lookup Table. While the direct lookup in the table of the stored cues suffers from the presence of both ambient noise and reverberation, as usual in real environments, the proposed method, exploiting the overall shape of the actual frequency spectrum of the signal, both its phase and modulus, reduces dramatically errors in the localization. In the paper we give also the experimental evidence that such method improves greatly the usual HRTF based identification methods.
Download

Paper Nr: 93
Title:

COLOUR SPACES STUDY FOR SKIN COLOUR DETECTION IN FACE RECOGNITION SYSTEMS

Authors:

José Manuel Chaves-González, Miguel A. Vega-Rodríguez, Juan A. Gómez-Pulido and Juan Manuel Sánchez-Pérez

Abstract: In this paper we show the results of a work where a comparison among different colour spaces is done in order to know which one is better for human skin colour detection in face detection systems. Our motivation to do this study is that there is not a common opinion about which colour space is the best choice to find skin colour in an image. This is important because most of face detectors use skin colour to detect the face in a picture or a video. We have done a study using 10 different colour spaces (RGB, CMY, YUV, YIQ, YCbCr, YPbPr, YCgCr, YDbDr, HSV –or HSI– and CIE-XYZ). To make the comparisons we have used truth images of 15 different people, comparing at pixel level the number of correct detections (false negatives and false positives) for each colour space.
Download

Paper Nr: 126
Title:

BOUNDARY POINT DETECTION FOR ULTRASOUND IMAGE SEGMENTATION USING GUMBEL DISTRIBUTIONS

Authors:

Brian Booth and Xiaobo Li

Abstract: Due to high noise, low contrast, and other imaging artifacts, region boundaries in ultrasound images often do not conform to the assumptions of many image processing algorithms. Specifically, the beliefs that region boundaries have a high gradient magnitude or a high intensity can break down in this context. In this paper, we present an alternative way of detecting likely boundary points in ultrasound images by decomposing the image into one-dimensional intensity scans. These intensity scans, mimicking traditional A-Mode ultrasound, are modeled using Gumbel distributions. Results show that the relationship between the modes of these distributions and regions boundaries is relatively strong.
Download

Paper Nr: 135
Title:

FEATURES EXTRACTION FOR MUSIC NOTES RECOGNITION USING HIDDEN MARKOV MODELS

Authors:

Fco. Javier Salcedo Campos, Jesús E. Díaz-Verdejo and José Carlos Segura

Abstract: In recent years Hidden Markov Models (HMMs) have been successfully applied to human speech recognition. The present article proves that this technique is also valid to detect musical characteristics, for example: musical notes. However, any recognition system needs to get a suitable set of parameters, that is, a reduced set of magnitudes that represent the outstanding aspects to classify an entity. This paper shows how a suitable parameterisation and adequate HMMs topology make a robust recognition system of musical notes. At the same time, the way to extract parameters can be used in other recognition technologies applied to music.
Download

Paper Nr: 139
Title:

AN IMPROVED SUPER RESOLUTION RECONSTRUCTION ALGORITHM FOR VIDEO SEQUENCE

Authors:

Hyo-Moon Cho and Sang Bock Cho

Abstract: In this paper, we introduce the input image selection-method to improve the reconstructed high-resolution (HR) image quality. To obtain ideal super-resolution (SR) reconstruction image, all input images are well-registered. However, the registration is not ideal in practice. By reason of this, the number of input images with low registration error is more important than the number of input images in order to obtain good quality of a HR image. The input image suitability could be evaluated by using statistical and restricted registration properties. Therefore, we propose the input image evaluation-method in automatic manner as pre-processing of SR reconstruction and its architecture. In video sequences, all input images in specified region are allowed to use SR reconstruction as low-resolution (LR) input image and/or the reference image. The evaluation basis is decided by the threshold value and this threshold is calculated by using the maximum motion compensation error (MMCE) of the reference image. If the motion compensation error (MCE) of LR input image is in the range of 0 < MCE < MMCE then this LR input image is selected for SR reconstruction, else then LR input image are neglected. The optimal reference LR (ORLR) image is decided by comparing the number of the selected LR input (SLRI) images for each reference LR input (RLRI) image. Finally, we generate a HR image by using optimal reference LR image and selected LR images and by using the Hardie’s interpolation method. This proposed algorithm is expected to improve the quality of SR without any user intervention.
Download

Paper Nr: 150
Title:

SEARCHING FOR A ROBUST MFCC-BASED PARAMETERIZATION FOR ASR APPLICATION

Authors:

J.v. Psutka, Luboš Šmídl and Ales Prazak

Abstract: The paper concerns with searching for areas of robust setting a MFCC-based parameterization as regards numbers of band-pass filters and computed coefficients. Settings that are theoretically recommended for telephone and microphone speech are compared with a large number of experimental results and a new technique for determination of robust areas of {<# of band-pass filters>×<# of coefficients>} is designed.
Download

Paper Nr: 151
Title:

IMAGE RESTORATION - A New Explicit Approach in Filtering and Restoration of Digital Images

Authors:

Pejman Rahmani, Benoit Vozel and Kacem Chehdi

Abstract: Image restoration, in presence of noise, is well known to be an ill-posed inverse problem. Deconvolution of blurry and noisy digital images is a very active research area in image processing. This paper introduces a novel approach composed of two optimized sequential stages of image processing: denoising followed by deconvolution. In the first stage, the denoising filter and the number of iteration are chosen in order to obtain the best value of the usual criteria and the good recovering of the blurry image. We assume that the statistics of the noise are previously estimated. In the second stage, a deconvolution method is applied on an almost noise free version of the blurry image. Compared with the classical deconvolution methods, the numerical experiments of proposed method, appear to give significant improvement. The preliminary results of the new cascade approach are very encouraging as well.
Download

Paper Nr: 156
Title:

ADAPTIVE AND COOPERATIVE SEGMENTATION SYSTEM FOR MONO- AND MULTI-COMPONENT IMAGES

Authors:

Madjid Moghrani, Claude Cariou and Kacem Chehdi

Abstract: We present a cooperative and adaptive system for multi-component image segmentation, in which segmentation methods used are based upon the classification of pixels represented by statistical features chosen with respect to the nature of the regions to segment. One originality of this system is its adaptive characteristic: it allows taking into account the local context in the image to automatically adapt the segmentation process to the nature of specific regions which can be uniform or textured. The method used for the detection of the regions’ nature is based on a classification of pixels with respect to the uniformity index of Haralick. Then a cooperative approach is set up for the textured areas which can combine results incoming from different classification methods and choose the best result at the pixel level using an assessment index. In order to validate the system and show the relevance of the adaptive procedure used, experimental results are presented for the segmentation of synthetic and real multi-component CASI images.
Download

Paper Nr: 181
Title:

A THREE-LAYER SYSTEM FOR IMAGE RETRIEVAL

Authors:

Daidi Zhong and Defee Irek

Abstract: Visual patterns are composed of basic features forming well-defined structures and/or statistical distributions. Often, they always present simultaneously in visual images. This makes the problem of description and representation of visual patterns complicated. In this paper we proposed a hierarchical retrieval system, which is based on subimages and combinations of feature histograms, to efficiently combine structure and statistical information for retrieval tasks. We illustrate the results on face database retrieval problem. It is shown that proper selection of subimage and feature vectors can significantly improve the performance with minimized complexity.
Download

Area 3 - Multimedia Systems and Applications

Full Papers
Paper Nr: 20
Title:

SPONTANEOUS AND PERSONALIZED ADVERTISING THROUGH MPEG-7 MARKUP AND SEMANTIC REASONING - Exploring New Ways for Publicity and Marketing over Interactive Digital TV

Authors:

Martin Lopez-Nores, José J. Pazos-Arias, Jorge Garcia-Duque, Yolanda Blanco-Fernandez, Marta Rey-López and Esther Casquero-Villacorta

Abstract: Publicity is one of the sustaining pillars of the television industry. In an increasingly competitive market, the involved agents are striving to exploit all the possibilities to get revenues from advertising, but their techniques lack targeting and are usually at odds with the comfort of the TV viewers. In response to those problems, this paper introduces a new advertising model that aims at harnessing the interactive capabilities of the modern TV receivers (either domestic or mobile ones). The approach is based on automatically identifying products which are semantically related to the things on screen that catch the viewer’s attention, and then assembling interactive services that provide him/her with personalized commercial functionalities.
Download

Paper Nr: 121
Title:

KNOWLEDGE ENGINEERING FOR AFFECTIVE BI-MODAL HUMAN-COMPUTER INTERACTION

Authors:

Efthimios Alepis, Maria Virvou and Katerina Kabassi

Abstract: This paper presents knowledge engineering for a system that incorporates user stereotypes as well as a multi criteria decision making theory for affective interaction. The system bases its inferences about students’ emotions on user input evidence from the keyboard and the microphone. Evidence form these two modes is combined by a user modelling component underlying the user interface. The user modelling component reasons about users’ actions and voice input and makes inferences in relation to their possible emotional states. The mechanism that integrates the inferences form the two modes has been based on the results of two empirical studies that were conducted in the context of requirements analysis of the system. The evaluation of the developed system showed significant improvements in the recognition of the emotional states of users.
Download

Short Papers
Paper Nr: 43
Title:

4I (FOR EYE) MULTIMEDIA - Intelligent Semantically Enhanced and Context-ware Multimedia Browsing

Authors:

Oleksiy Khriyenko

Abstract: Next generation of integration systems will utilize different methods and techniques to achieve the vision of ubiquitous knowledge: Semantic Web and Web Services, Agent Technologies and Mobility. Unlimited interoperability and collaboration are the important things for almost all the areas of people life. Development of a Global Understanding eNvironment (GUN) (Kaykova et al., 2005), which would support interoperation between all the resources and exchange of shared information, is a very profit-promising and challenging task. And as usually, a graphical user interface is one of the important parts in a process performing. Following the new technological trends, it is time to start a stage of semantic-based context-dependent multidimensional resource visualization and semantic metadata based browsing across resources. With a growing ubiquity of digital media content, whose management requires suitable annotation and systems able to use that annotation, the ability to combine continuous media data with its own multimedia specific content description into the one source brings the idea of a true multimedia semantic web one step closer. Thus, 4I (FOR EYE) technology (Khriyenko, 2007) is a perfect basis for elaboration of intelligent semantically enhanced and context-aware across multimedia content browsing.
Download

Paper Nr: 49
Title:

ENHANCING LSB STEGANOGRAPHY AGAINST STEGANALYSIS ATTACKS USING COMBINATIONAL LSBS

Authors:

Yahya Belghuzooz and Ali Al-Qayedi

Abstract: This paper describes an enhanced approach for hiding secret messages in the spatial domain of digital cover images such that the resulting stego-images are robust to steganalysis attacks. Firstly, different methods of hiding in the Least Significant Bits (LSBs) are comparatively discussed including the Sequential and the Random algorithms. Then our approach is illustrated which uses a combination of LSBs to store large amounts of secret information while maintaining robustness against detection by steganalysis attacks. The results achieved are commensurate to those obtained using widely available stego tools.
Download

Paper Nr: 54
Title:

SPATIALIZED AUDIO CONFERENCES - IMS Integration and Traffic Modelling

Authors:

Christopher J. Reynolds, Martin J Reed and Peter J Hughes

Abstract: Existing monophonic multiparty VoIP conferencing applications are currently limited to supporting a single conversation floor, with limited numbers of simultaneous speakers. We discuss the additional requirements and benefits of delivering a spatially enhanced audio application via Head Related Transfer Function (HRTF) filtering, which may support many conversation floors. Several network delivery architectures are presented, including integration to the Next Generation Network (NGN) IP Multimedia Subsystem (IMS). The delivery architectures are compared using traffic models, and implications for the scope of such an application are discussed.
Download

Paper Nr: 61
Title:

USING IMAGE TO FOSTER BUSINESS TO CONSUMER ONLINE TRUST

Authors:

Khalid Al-Diri, Dave Hobbs and Rami Qahwaji

Abstract: Much of the latest research on business to consumer (B2C) e-commerce has focused on ways of building trust through cues that encourage consumers to purchase through online since it suffers from the lack of face to face interpersonal exchanges that enhance trust behaviour in conventional commerce. To bridge the human interaction dilemma, an extensive laboratory based experiment was conducted to assess the trust of consumers using four online vendors’ websites. This paper addresses the issues and findings of a study that uses Western and Saudi images as well as video clips to mimic customer support in increasing the behavioural purchasing trust of the online vendor. The findings from the study clearly highlight that images have an imperative role to play in increasing the trust of online consumers with Saudi images playing a pivotal role in increasing this kind of trust.
Download

Paper Nr: 69
Title:

CHANGE DETECTION AND BACKGROUND UPDATE THROUGH STATISTIC SEGMENTATION FOR TRAFFIC MONITORING

Authors:

Theodoros Alexandropoulos, Vassili Loumos and Eleftherios Kayafas

Abstract: Recent advances in computer imaging have led to the emergence of video-based surveillance as a monitoring solution in Intelligent Transportation Systems (ITS). The deployment of CCTV infrastructure in highway scenes facilitates the evaluation of traffic conditions. However, the majority of video-based ITS are restricted to manual assessment and lack the ability to support automatic event notification. This is due to the fact that, the effective operation of intelligent traffic management relies strongly on the performance of an image processing front end, which performs change detection and background update. Each one of these tasks needs to cope with specific challenges. Change detection is required to perform the effective isolation of content changes from noise-level fluctuations, while background update needs to adapt to time-varying lighting variations, without incorporating stationary occlusions to the background. This paper presents the operation principle of a video-based ITS front end. A block-based statistic segmentation method for feature extraction in highway scenes is analyzed. The presented segmentation algorithm focuses on the estimation of the noise model. The extracted noise model is utilized in change detection in order to separate content changes from noise fluctuations. Additionally, a statistic background estimation method, which adapts to gradual illumination variations, is presented.
Download

Paper Nr: 78
Title:

IMPROVEMENT OF VOIP QUALITY BY PACKET DROPPING IN ADSL ROUTERS

Authors:

Qin Dai, Matthias Baumann and Ralf Lehnert

Abstract: Packet dropping is known as a simple mechanism to control TCP traffic. In this paper, TCP packet dropping is introduced in the egress router of an ADSL downlink. The aim is to improve the quality of VoIP connections that compete with TCP applications in downlink direction. The ADSL downlink buffer is assumed to operate as simple FCFS queue. Different simulations have been conducted that evaluate the mechanism in two scenarios. Firstly, the long-term impact of the mechanism both on VoIP application and TCP applications is investigated. Secondly, with more realistic network settings, the effectiveness of the mechanism for a short-time real speech is evaluated. The speech’s PESQ estimate is used to assess the service quality. The results indicate that in both cases packet dropping can improve the VoIP quality. However, the required high dropping ratio can result in TCP traffic bursts and therefore unstable VoIP quality as well as bad TCP performance.
Download

Paper Nr: 86
Title:

TRAJECTORY OF SINGULAR ENERGIES FOR IMAGE REPLICA DETECTION

Authors:

Karol Wnukowicz, Wladyslaw Skarbek and Grzegorz Galinski

Abstract: Image replica detection system can be used by the owners of digital multimedia content to protect their rights against unauthorised use of their material. The paper presents a new approach for content-based image replica detection. A concept of singular energy trajectory is introduced and evaluated. It appears that this trajectory is invariant to many image operations. Moreover, the trajectories of original images and distorted copies are highly correlated. These properties make the proposed method a good tool for image replica detection.
Download

Paper Nr: 91
Title:

TOWARDS BUILDING FAIR AND ACCURATE EVALUATION ENVIRONMENTS

Authors:

Dumitru Dan Burdescu and Cristian Mihaescu

Abstract: Each e-Learning platform has implemented means of evaluating learner’s knowledge by a specific grading methodology. This paper proposes a methodology for obtaining knowledge about the testing environment. The obtained knowledge is further used in order to make the testing system more accurate and fair. Integration of knowledge management into an e-Learning system is accomplished through a dedicated software module that analyzes learner’s performed activities, creates a learner’s model and provides a set of recommendations for course managers and learners in order to achieve prior set goals.
Download

Paper Nr: 99
Title:

REVERSIBLE AND SEMI-BLIND RELATIONAL DATABASE WATERMARKING

Authors:

Gaurav Gupta and Josef Pieprzyk

Abstract: In 2002, Agrawal and Kiernan proposed a relational database watermarking scheme that modifies least significant bits (LSBs) of numerical attributes selected using a secret key. The scheme does not address query preservation (some queries give different results when executed on the original and watermarked relation). Additive and secondary watermarking attacks on the watermarked relation are also possible. Such attacks can render the original watermark undetectable. Hence, an attacker who embeds his watermark in a previously watermarked relation can claim ownership of that relation. However, if the scheme is reversible, then a previous watermark, if any, can be detected in the reversed relation. In this paper, we propose an enhanced reversible, semi-blind and query-preserving watermarking scheme. Using this scheme, the correct owner of a relation can be identified even if the relation has been watermarked by multiple parties. If required, the database can be restored to it’s original state too. This finds applications in high-precision settings such as military operations or scientific experiments.
Download

Paper Nr: 101
Title:

HIGH RATE DATA HIDING IN SPEECH SIGNAL

Authors:

Ehsan Jahangiri and Shahrokh Ghaemmaghami

Abstract: One of the main issues with data hiding algorithms is capacity of data embedding. Most of data hiding methods suffer from low capacity that could make them inappropriate in certain hiding applications. This paper presents a high capacity data hiding method that uses encryption and the multi-band speech synthesis paradigm. In this method, an encrypted covert message is embedded in the unvoiced bands of the speech signal that leads to a high data hiding capacity of tens of kbps in a typical digital voice file transmission scheme. The proposed method yields a new standpoint in design of data hiding systems in the sense of three major, basically conflicting requirements in steganography, i.e. inaudibility, robustness, and data rate. The procedures to implement the method in both basic speech synthesis systems and in the standard mixed-excitation linear prediction (MELP) vocoder are also given in detail.
Download

Paper Nr: 106
Title:

IMPROVING VOD P2P DELIVERY EFFICIENCY OVER INTERNET USING IDLE PEERS

Authors:

Leandro Souza, Xiaoyuan Yang, Ana Ripoll and Fernando Cores

Abstract: This paper presents DynaPeer Chaining, a peer-to-peer Video-on-Demand (VoD) delivery policy designed to deal with high bandwidth requirement of multimedia contents and additional constraints imposed by Internet environment: higher delays and jitter, network congestion, non-symmetrical clients’ bandwidth and inadequate support for multicast communications. We consider the scenario where we have multiple ADSL-based peers that stream the same video to multiple receivers. We propose an adaptive scheme to take advantage of idle peers in order to improve system efficiency, even when extreme conditions (low request rates or limited peer resources) are considered. We conducted a performance comparison study of our proposal with classic multicast (Patching) and other P2P delivery schemes, such as Pn2Pn and Chaining, improving their performance by 50%, 62% respectively, even when taking into account Internet constraints.
Download

Paper Nr: 113
Title:

DESIGN AND IMPLEMENTATION OF MULTI-STANDARD AUDIO DECODER

Authors:

Kong Ji, Peilin Liu, Deng Ning, Fu Xuan, Zhang Guocheng, He Bin and Liu Qianru

Abstract: In this paper, a design and implementation for Multi-Standard Audio Decoder is presented. The architecture of the decoder is designed to support MPEG-2/MPEG-4 AAC LC Profile (ISO/IEC 13818-7 2006) (ISO/IEC 14496-3 2006), Dolby AC-3 (ATSC 1995), Ogg Vorbis (Xiph.org Foundation 2004), Windows Media Audio (WMA) (Microsoft 2006) and MPEG-1 Layer 3 (MP3) (ISO-IEC/JTC1 SC29 1991). Based on the analysis of algorithms of these multi-standards, software/hardware co-design method is used to implement the audio decoder in which a module called FILTERBANK is designed as a hardware engine. The FILTERBANK which can support IMDCT (Inverse Modified Discrete Cosine Transform) process of different standards is configured by CPU according to the decoded information. Compared with the solutions of DSP/RISC or ASIC multi-standard decoders, our Multi-Standard decoder has achieved a balance between software’s flexibility and hardware’s high efficiency. Also it meets the requirement of low cost, low power and high audio quality. The implementation results on FPGA are given and the performance of the decoder is evaluated.
Download

Paper Nr: 113
Title:

DESIGN AND IMPLEMENTATION OF MULTI-STANDARD AUDIO DECODER

Authors:

Kong Ji, Peilin Liu, Deng Ning, Fu Xuan, Zhang Guocheng, He Bin and Liu Qianru

Abstract: In this paper, a design and implementation for Multi-Standard Audio Decoder is presented. The architecture of the decoder is designed to support MPEG-2/MPEG-4 AAC LC Profile (ISO/IEC 13818-7 2006) (ISO/IEC 14496-3 2006), Dolby AC-3 (ATSC 1995), Ogg Vorbis (Xiph.org Foundation 2004), Windows Media Audio (WMA) (Microsoft 2006) and MPEG-1 Layer 3 (MP3) (ISO-IEC/JTC1 SC29 1991). Based on the analysis of algorithms of these multi-standards, software/hardware co-design method is used to implement the audio decoder in which a module called FILTERBANK is designed as a hardware engine. The FILTERBANK which can support IMDCT (Inverse Modified Discrete Cosine Transform) process of different standards is configured by CPU according to the decoded information. Compared with the solutions of DSP/RISC or ASIC multi-standard decoders, our Multi-Standard decoder has achieved a balance between software’s flexibility and hardware’s high efficiency. Also it meets the requirement of low cost, low power and high audio quality. The implementation results on FPGA are given and the performance of the decoder is evaluated.
Download

Paper Nr: 123
Title:

REALIZATION AND OPTIMIZATION OF H.264 DECODER FOR DUAL-CORE SOC

Authors:

Jia-ming Chen, Chiu-ling Chen, Jian-liang Luo, Po-wen Cheng, Chia-hao Yu, Shau-Yin Tseng and Wei-Kuan Shih

Abstract: This paper presents an H.264/AVC decoder realization on a dual-core SoC (System-on-Chip) platform by the well-designed macroblock level software partitioning. Furthermore, optimizations of the procedures executed on each core, and data movement between two cores are captured from software and hardware techniques. The evaluation results show that a video with D1 (720×480 pixels) resolution can reach real-time decoding by the implementation, which provides a valuable experience for similar designs.
Download

Paper Nr: 153
Title:

IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM

Authors:

Rong Fu and Ian Benest

Abstract: This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone. It is based on the ICSI-SRI Fall 2004 diarization system (Wooters et al., 2004), but it has a number of significant modifications. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. It determines the model complexity automatically. It adapts the segment model from a Universal Background Model (UBM), and uses the cross-likelihood ratio (CLR) instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Altogether this reduces the speaker diarization error rate from 25.36% to 21.37% compared to the baseline system (Wooters et al., 2004).
Download

Paper Nr: 153
Title:

IMPROVEMENTS IN SPEAKER DIARIZATION SYSTEM

Authors:

Rong Fu and Ian Benest

Abstract: This paper describes an automatic speaker diarization system for natural, multi-speaker meeting conversations using one central microphone. It is based on the ICSI-SRI Fall 2004 diarization system (Wooters et al., 2004), but it has a number of significant modifications. The new system is robust to different acoustic environments - it requires neither pre-training models nor development sets to initialize the parameters. It determines the model complexity automatically. It adapts the segment model from a Universal Background Model (UBM), and uses the cross-likelihood ratio (CLR) instead of the Bayesian Information Criterion (BIC) for merging. Finally it uses an intra-cluster/inter-cluster ratio as the stopping criterion. Altogether this reduces the speaker diarization error rate from 25.36% to 21.37% compared to the baseline system (Wooters et al., 2004).
Download

Paper Nr: 155
Title:

BACKWARDS COMPATIBLE, MULTI-LEVEL REGIONS-OF-INTEREST (ROI) IMAGE ENCRYPTION ARCHITECTURE WITH BIOMETRIC AUTHENTICATION

Authors:

Alexander Wong and William Bishop

Abstract: Digital image archival and distribution systems are an indispensable part of the modern digital age. Organizations perceive a need for increased information security. However, conventional image encryption methods are not versatile enough to meet more advanced image security demands. We propose a universal multi-level ROI image encryption architecture that is based on biometric data. The proposed architecture ensures that different users can only view certain parts of an image based on their level of authority. Biometric authentication is used to ensure that only an authorized individual can view the encrypted image content. The architecture is designed such that it can be applied to any existing raster image format while maintaining full backwards compatibility so that images can be viewed using popular image viewers. Experimental results demonstrate the effectiveness of this architecture in providing conditional content access.
Download

Paper Nr: 157
Title:

SVG BASED SECURE UNIVERSAL MULTIMEDIA ACCESS

Authors:

Ahmed Reda Kaced and Jean-claude Moissinac

Abstract: In this paper, we develop and implement our Secure Universal Multimedia Access system (SUMA) for SVG content in the following three subtasks. For content adaptation, we based on XML/RDF, CC/PP and XSLT, for signing and authenticating SVG content we use Merkle hash tree technique and for content delivery, we develop a mechanism for dynamic delivery of multimedia content over wired/wireless network. We present Signature scheme and an access control system that can be used for controlling access to SVG documents. The first part of this paper briefly describes the access control model on which the system is based. The second part of this paper presents the design and implementation of SUMA adaptation engine. SUMA aims to deliver an end-to-end authenticity of original SVG content exchanged in a heterogeneous network while allowing content adaptation by intermediary proxies between the content transmitter and the final users. Adaptation and authentication management are done by the intermediary proxies, transparently to connected hosts, which totally make abstraction of these processes.
Download

Paper Nr: 168
Title:

APPEARANCE-BASED HUMAN GALLERY CONSTRUCTION FROM VIDEO

Authors:

Kyongil Yoon, Yaser Yacoob, David Harwood and Larry Davis

Abstract: An approach for constructing a dynamic gallery of people observed in a video stream is described. We consider two scenarios that require determining the number and identity of participants: outdoor surveillance and meeting rooms. In these applications face identification is typically not feasible due to the low resolution across the face. The proposed approach automatically computes an appearance model based on the clothing of people and employs this model in constructing and matching the gallery of participants. The appearance model uses color/path-length profile and a robust distance measure based on Kernel Density Estimation (KDE) and Kullback-Leibler (KL) distance, to evaluate similarity between people and add models to the gallery. A one-to-one constraint is enforced to correctly match instances to models at each frame. In the meeting room scenario we exploit the fact that the relative locations of subjects are likely to remain unchanged for the whole sequence.
Download

Paper Nr: 170
Title:

IMPACTS OF LEVEL-2 CACHE ON PERFORMANCE OF MULTIMEDIA SYSTEMS AND APPLICATIONS

Authors:

Abu Asaduzzaman, Manira Rani and Darryl Koivisto

Abstract: Multimedia systems normally suffer while processing multimedia applications because of their limited resources. The demand for tremendous amount of processing power raises serious challenges for multimedia systems and applications. Studies show that cache memory has strong influence on the performance of multimedia systems and applications. In our previous work, we optimize level-1 cache parameters to enhance the performance of portable devices running MPEG4 decoder. The focus of this paper is to evaluate the impacts of level-2 cache on the performance of multimedia systems running MPEG4 and H.264/AVC encoders. We develop VisualSim model and C++ code to run the simulation. We measure miss rates, CPU utilization, and power consumed by varying level-2 cache size. Simulation results show that the performance of multimedia systems and applications can be enhanced by optimizing level-2 cache.
Download

Paper Nr: 50
Title:

BLUEMUSIC: A MULTICHANNEL ARCHITECTURE FOR MUSIC DISTRIBUTION

Authors:

Marco Furini

Abstract: Despite the increasing number of e-music downloads and the large use of mobile devices, the mobile music market is slowly taking off. Prices seem to be the main burden: compared to the wired environment, a song is twice or three times more expensive. Furthermore, mobile network transfer data rate causes the download time to be very long. In this paper we propose Bluemusic, a multi channel architecture that couples the usage of the mobile phone network with the free-of-charge communication technologies provided in cellphones (e.g., bluetooth or Wi-Fi) to distribute music in the mobile environment. To protect digital contents, Bluemusic is provided with a security mechanism that prevents illegal contents distribution. An evaluation of our approach shows that Bluemusic can be helpful to the expansion of the mobile music market.
Download

Paper Nr: 60
Title:

A NEW HORIZON BECKONS FOR SAUDI ARABIA IN THE TECHNOLOGICAL AGE OF E-COMMERCE & ON-LINE SHOPPING

Authors:

Khalid Al-Diri, Dave Hobbs and Rami Qahwaji

Abstract: Electronic commerce is a worldwide phenomenon. Its diffusion has apparently taken different paths in different nations. This is partially because of the significant differing characteristics of national infrastructural and the political and socio-economic environments for e-commerce adoption. The growing use of the Internet in Saudi Arabia provides a developing prospect for e-shopping. Despite the high potential of online shopping in Saudi Arabia, there is still a lack of understanding concerning the subject matter and its potential impact on consumer. This paper is part of larger study, and aims to establish a preliminary assessment, evaluation and understanding of the characteristics of online shopping in Saudi Arabia, based on a sample of 144 Internet users, it explores their information-seeking patterns as well as their motivations and concerns for online shopping. Consumers in Saudi Arabia still lack of trust in the vendors’ websites when utilizing the Internet as a shopping channel. They are mainly concerned about issues related to security and privacy when dealing with online vendors, and also about issues regarding the Saudi Internet network, English language as a dominant Internet language. While the most motivators for Saudis to utilize the online shopping were convenience, product/service not available offline, and the price respectively. We present and discuss our findings, and identify changes that will be required for broader acceptance and diffusion of online shopping in Saudi Arabia.
Download

Paper Nr: 100
Title:

A REAL TIME TRAFFIC ENGINEERING SCHEME FOR BROADBAND CONVERGENCE NETWORK (BCN)

Authors:

Hwa-Jong Kim, Myoung Soon Jeong and Jong-Won Kim

Abstract: Recently, Broadband Convergence Network (BcN), a Korean version of Next Generation Network (NGN), was introduced to guarantee pre-defined QoS for high speed multimedia service. The BcN is considering charging users for premium services. For the BcN to be successfully diffused, however, a practical traffic engineering (TE) tool is required because the BcN is composed of many kinds of subnetworks, and real time feedback (traffic control) would be mandatory for the premium service. In the paper, a new TE scheme for BcN, Rule Based Capture(RBC)/User Satisfaction Parameter(USP), is proposed to resolve the latent problems of the BcN TE. The USP is designed to be admitted by many BcN subnetworks as a common intermediate description of service quality instead of conventional QoS parameters. The RBC is introduced to settle the real time TE issue against the vast accumulation of traffic monitoring data. The pilot RBC/USP is implemented on the Linux platform and its performance is investigated. We found that the average traffic log size is reduced to 0.058% for FTP, and 2.39% for streaming service by using the RBC/USP.
Download

Paper Nr: 102
Title:

A MULTIMEDIA DATABASE MANAGEMENT SYSTEM FOR MEDICAL DATA

Authors:

Liana Stanescu, Dumitru Burdescu, Marius Brezovan and Cosmin Stoica

Abstract: The paper presents a relational multimedia database management system for managing visual and alphanumerical information from the medical domain. The MMDBMS offers numerical and char data types for alphanumerical information, and Image data type used for storing in an original manner the visual information. An Image data type stores the image in a binary manner, its type, its dimensions and information about color and texture that are automatically extracted. This information will be used for content-based visual query process. The color information is represented by the color histogram quantified to 166 colors in the HSV color space. The texture information is represented by a vector with 12 values resulted from the method that uses Gabor filters for texture detection. This DBMS brings up as an element of originality the visual interface for building content-based image query using color and texture characteristics and a modified Select command. This MMDBMS, implemented using Java technologies is platform independent and can be easily used by the medical personnel.
Download

Paper Nr: 107
Title:

ON ENCRYPTION AND AUTHENTICATION OF THE DC DCT COEFFICIENT

Authors:

Li Weng and Bart Preneel

Abstract: When encryption and authentication techniques are applied to image or video data, sometimes it is advantageous to limit the operation to the DC DCT coefficient of each 8 × 8 block in a picture. In this work, the performance of such an approach is evaluated. This problem is considered as an image quality problem, and the metric structural similarity is used to show that by authenticating the DC coefficient, about 60% of the information can be guaranteed; by encrypting the DC coefficient, about 80% of the information can be hindered.
Download

Paper Nr: 119
Title:

VOICE USER INTERFACE USING VOICEXML - Environment, Architecture and Dialogs Initiative

Authors:

Alexandre Maciel and Edson Carvalho

Abstract: In this work we present a set of applications for Internet with voice user interface using VoiceXML language. Architecture, main platforms and dialog initiative ways were studied. Applicability and limitations were determined.
Download

Paper Nr: 131
Title:

A MULTIMEDIA IMS ENABLED RESIDENTIAL SERVICE GATEWAY

Authors:

Vitor Pinto, Vitor Ribeiro, Iván Vidal, Jaime García, Francisco Valera and Arturo Azcorra

Abstract: Internet access has been, until now, the main driver for the generalization of broadband connections in the residential market. Simple IP based services like email and web browsing were, during many years, the typical services provided to residential customers. Today the telecommunications market is changing and operators are looking for ways to provide, through those same IP broadband connections, value added services. These will, in one hand, increase their revenues and on the other hand, provide to the customer a wider range of services until now inaccessible. Triple Play is already a reality, although, the convergence between mobile and fixed networks is bringing to the home a new range of IP Multimedia Subsystem (IMS) based services, which used to be exclusive of the mobile world. Although, to successfully achieve the delivery of these new services, the interface between residential and operator’s networks must be meticulously defined and implemented, by what is usually called the residential gateway (RGW). This paper focuses on emerging residential services and the implications that these impose on the RGW. The coexistence between IMS based services and non-IMS based services are also approached on this paper, with a special emphasis on RGW Quality of Service (QoS) issues.
Download

Paper Nr: 143
Title:

DOCXS - A Distributed Computing Environment for Multimedia Data Processing

Authors:

Tobias Lohe, Michael Fieseler, Steffen Wachenfeld and Xiaoyi Jiang

Abstract: This paper presents DocXS, a distributed computing environment for multimedia data processing, which was developed at the University of Mu¨ nster, Germany. DocXS is platform independent due to its implementation in Java, is freely available for non-commercial research, and can be installed on standard office computers. The main advantage of DocXS is that it does not require its users to care about code distribution or parallelization. Algorithms can be programmed using an Eclipse-based user interface and the resulting Matlab and Java operators can be visually connected to graphs representing complex data processing workflows. Experiments with DocXS show that it scales very well with only a small overhead.
Download

Paper Nr: 161
Title:

FIDELITY AND ROBUSTNESS ANALYSIS OF IMAGE ADAPTIVE DWT-BASED WATERMARKING SCHEMES

Authors:

Franco Alberto Del Colle and Juan Carlos Gómez

Abstract: An Image Adaptive Watermarking method based on the Discrete Wavelet Transform is presented in this paper. The robustness and fidelity of the proposed method are evaluated and the method is compared to state-of-the-art watermarking techniques available in the literature. For the evaluation of watermark transparency, an image fidelity factor based on a perceptual distortion metric is introduced. This new metric allows a perceptually aware objective quantification of image fidelity.
Download

Paper Nr: 161
Title:

FIDELITY AND ROBUSTNESS ANALYSIS OF IMAGE ADAPTIVE DWT-BASED WATERMARKING SCHEMES

Authors:

Franco Alberto Del Colle and Juan Carlos Gómez

Abstract: An Image Adaptive Watermarking method based on the Discrete Wavelet Transform is presented in this paper. The robustness and fidelity of the proposed method are evaluated and the method is compared to state-of-the-art watermarking techniques available in the literature. For the evaluation of watermark transparency, an image fidelity factor based on a perceptual distortion metric is introduced. This new metric allows a perceptually aware objective quantification of image fidelity.
Download