SIGMAP 2021 Abstracts


Area 1 - Multimedia and Deep Learning

Full Papers
Paper Nr: 12
Title:

RoBINN: Robust Bird Species Identification using Neural Network

Authors:

Chirag Samal, Prince Yadav, Sakshi Singh, Satyanarayana Vollala and Amrita Mishra

Abstract: Recent developments in machine and deep learning have made it possible to expand the realms of traditional audio pattern recognition to real-time and practical applications. This work proposes a novel framework for robust bird species identification using the neural network (RoBINN) based on their unique vocal signatures. To make the network robust and efficient, data augmentation is performed to create synthetic training samples for bird species with less available recordings. Further, inherent properties of audio signals are suitably leveraged via effective speech recognition-based feature engineering techniques to develop an end-to-end convolutional neural network (CNN). Additionally, the proposed model architecture for the CNN framework employs residual learning and attention mechanism to generate attention-aware features, which enhances the overall accuracy of birdcall identification. The proposed architecture employs an exhaustive dataset with 21375 recordings corresponding to 264 bird species. Experimental results validate the proposed bird species classification technique in terms of accuracy, F1-score, and binary cross-entropy loss.
Download

Short Papers
Paper Nr: 6
Title:

Invasive Measurements Can Provide an Objective Ceiling for Non-invasive Machine Learning Predictions

Authors:

Christopher W. Bartlett, Jamie Bossenbroek, Yukie Ueyama, Patricia E. Mccallinhart, Aaron J. Trask and William C. Ray

Abstract: Early stopping is an extremely common tool to minimize overfitting, which would otherwise be a cause of poor generalization of the model to novel data. However, early stopping is a heuristic that, while effective, primarily relies on ad hoc parameters and metrics. Optimizing when to stop remains a challenge. In this paper, we suggest that for some biomedical applications, a natural dichotomy of invasive/non-invasive measurements of a biological system can be exploited to provide objective advice on early stopping. We discuss the conditions where invasive measurements of a biological process should provide better predictions than non-invasive measurements, or at best offer parity. Hence, if data from an invasive measurement is available locally, or from the literature, that information can be leveraged to know with high certainty whether a model of non-invasive data is overfitted. We present paired invasive/non-invasive cardiac and coronary artery measurements from two mouse strains, one of which spontaneously develops type 2 diabetes, posed as a classification problem. Examination of the various stopping rules shows that generalization is reduced with more training epochs and commonly applied stopping rules give widely different generalization error estimates. The use of an empirically derived training ceiling is demonstrated to be helpful as added information to leverage early stopping in order to reduce overfitting.
Download

Paper Nr: 10
Title:

Novel Pre-processing Stage for Classification of CT Scan Covid-19 Images

Authors:

D. Vijayalakshmi, Malaya K. Nath and Madhusudhan Mishra

Abstract: An accurate evaluation of computed tomography (CT) chest images is crucial in the early-stage detection of Covid-19. The accuracy of a diagnosis is determined by the imaging modality used and the images’ consistency. This paper describes a gradient-based enhancement algorithm (GCE) for CT images that can increase the visibility of the infected region. Using a multi-scale dependent dark pass filter aims to increase contrast while preserving information and edge details of the infected area. Joint occurrence between the edge details and pixel intensities of the input image is calculated to construct a cumulative distribution function (CDF). To obtain the contrast improved image, the CDF is mapped to the uniform distribution. The GCE approach is tested on the CT Covid database, and performance metrics like the contrast improvement index (CII), discrete entropy (DE), and Kullback-Leibler distance (KL-Distance) are used to evaluate the results. Compared to other techniques available in the literature, the GCE approach produces the highest CII and DE values and has more uniformity. To check the suitability of the enhancement algorithm in terms of pre-processing, a pre-trained AlexNet is employed for the classification of Covid-19 images. The finding shows an improvement of 7% in classification accuracy after enhancing the Covid-19 images using the GCE technique.
Download

Paper Nr: 14
Title:

Sperm Tracking and Trajectory Analysis in Fluorescence Microscopy Image Sequences

Authors:

Lucía Arboleya, Leonardo L. Santos, Mariano Fernández, Lucía Rosa-Villagrán, Rossana Sapiro and Federico Lecumberry

Abstract: In this work, we analyze the performance of several tracking methods in the scenarios of low temporal sampling acquisition setup in fluorescence microscopy. Machine Learning methods were applied to classify and analyze the extracted trajectories of sperm samples and their motion parameters. The results were compared with the most widely used sperm motility classification methods. Analyzed image sequences include real sequences acquired by confocal fluorescence microscopy and synthetic sequences generated by in-house software. The complete framework runs as a standalone application and can be used with minimal training by users with no programming skills.
Download

Area 2 - Multimedia Indexing and Retrieval

Short Papers
Paper Nr: 3
Title:

High-speed Retrieval and Authenticity Judgment using Unclonable Printed Code

Authors:

Kazuaki Sugai, Kitahiro Kaneda and Keiichi Iwamura

Abstract: The distribution of counterfeit products such as food packages, branded product tags, and drug labels, which are easy to imitate, has become a serious economic and safety problem. To address this problem, we propose a method to judge counterfeit products from commonly used inkjet-printed codes. To judge authenticity, copies of printed matter of an inkjet printer are used as they are difficult to duplicate. In this study, we propose a new authenticity judgment system that combines the locally likely arrangement hashing (LLAH) system, which performs high-speed image retrieval, and Accelerated-KAZE (A-KAZE), which matches the features of inkjet-printed matter with high accuracy to verify accuracy.
Download

Area 3 - Social Multimedia

Full Papers
Paper Nr: 2
Title:

Three-year Trends in YouTube Video Content and Encoding

Authors:

Feng Li, Jae W. Chung and Mark Claypool

Abstract: Despite the dominance of YouTube streaming traffic, there have been few studies focusing on characterizing YouTube videos over time. Given the sheer volume of YouTube videos, we created a custom crawler which took snapshots of popular YouTube channels and ran the crawler daily for the past 3 years. This provides YouTube video trends from 2018–2020 for over 160k videos, considering media type, duration, bit rate, resolution, codec, encoding format, and popularity. Analysis of the data shows YouTube videos have increased frame rates, resolutions and durations over this time, with the biggest clips consuming over 200 Mb/s and being over 3 hours long, accompanied by corresponding changes in encoding rates and codecs. Our analysis and the resulting dataset we make public should be beneficial for traffic shaping or CDN deployment strategies.
Download

Area 4 - Multimedia Signal Processing

Full Papers
Paper Nr: 9
Title:

Assessing the QoME of NMP via Audio Analysis Tools

Authors:

Konstantinos Tsioutas and George Xylomenos

Abstract: Analyzing the Quality of Musicians’ Experience (QoME) in Network Music Performance (NMP) typically involves having musicians perform NMP sessions and then assessing their experience via questionnaires. Such subjective studies produce results with wide variances, making the extraction of solid conclusions difficult. For this reason, we complemented a subjective study on the effects of delay in the QoME of NMP with an analysis of the audio captured during the study using automated tools. Specifically, we used signal processing techniques to analyze the captured audio, in order to detect tempo evolution during each performance and examine its correlation with delay. Our results indicate that musicians in real NMP settings are more tolerant to delay than previously thought, holding a steady tempo even with one way delays of 40 ms.
Download

Paper Nr: 13
Title:

Perceptual Active Equalization of Multi-frequency Noise

Authors:

Juan Estreder, Gema Piñero, Miguel Ferrer, Maria de Diego and Alberto Gonzalez

Abstract: In this paper we propose a novel multi-channel active noise equalizer (ANE) when music or speech signals are present inside the same room. Our perceptual ANE (PANE) can benefit from the masking effect of the music emitted carrying out a perceptual equalization (PEQ) of the undesired ambient noise. Our PEQ strategy automatically adapts the spectral profile of the ambient noise recorded at the error microphones to the masking threshold of the audio signal recorded at that same point. We present a real-time experiment carried out in our laboratory that simulates the position of a driver in a car to test the PANE with different audio signals. The experimental results are compared with two alternative strategies: the full cancellation (FC) profile that corresponds to an active noise cancellation strategy, and the hearing threshold (HT) profile that corresponds to an ANE system whose gains mimic the human audibility threshold. Both FC and HT profiles are independent of the music presented in the room. Results show that the noise power measured at the microphones is higher for the PEQ profile, but always below the masking threshold of the music, getting almost unnoticeable. However, the emitted anti-noise power in the case of PEQ is 15 dB lower compared to HT and FC profiles for frequencies above 300 Hz. This performance leads to a reduction of noise pollution in the room and a lower power consumption of the system loudspeakers. In addition, the PEQ profile provided by the novel PANE system is a versatile approach that can reduce the perceived noise as much as the user decides, even reaching the same performance than the HT or FC profiles if needed. Therefore, the PANE system is a versatile real-time alternative to the classic active noise cancellation systems for multi-frequency noise.
Download

Short Papers
Paper Nr: 4
Title:

Clustering-based Acceleration for High-dimensional Gaussian Filtering

Authors:

Sou Oishi and Norishige Fukushima

Abstract: Edge-preserving filtering is an essential tool for image processing applications and has various types of filtering. For real-time applications, acceleration of its speed is also essential. To accelerate various types of edge-preserving filtering, we represent various edge-preserving filtering by high-dimensional Gaussian filtering. Then, we accelerate the high-dimensional Gaussian filtering by clustering-based constant algorithm, which has O(K) order, where K is the number of clusters. The clustering-based method was developed for color bilateral filtering; however, this paper used it for high-dimensional bilateral filtering. Also, cooperating with tiling, k-means++, and principal component analysis, we can further improve the filter’s performance. Experimental results show that our method can approximate various edge-preserving filtering by approximated clustering-based high-dimensional Gaussian filtering.
Download

Paper Nr: 11
Title:

Fast and Efficient Union of Sparse Orthonormal Transform for Image Compression

Authors:

Gihwan Lee and Yoonsik Choe

Abstract: Sparse coding has been widely used in image processing. Overcomplete-based sparse coding is powerful to represent data as a small number of bases, but with time-consuming optimization methods. Orthogonal sparse coding is relatively fast and well-suitable in image compression like analytic transforms with better performance than the existing analytic transforms. Thus, there have been many attempts to design image transform based on orthogonal sparse coding. In this paper, we introduce an extension of sparse orthonormal transform (SOT) based on unions of orthonormal bases (UONB) for image compression. Different from UONB, we allocate image patches to one orthonormal dictionary according to their direction. To accelerate the method, we factorize our dictionaries into the discrete cosine transform matrix and another orthonormal matrix. In addition, for more effective implementation, calculation of direction is also conducted in DCT domain. As expected, our framework fulfills the goal of improving compression performance of SOT with fast implementation. Through experiments, we verify that proposed method produces similar performance to overcomplete dictionary outperforms SOT in compression with rather faster speed. The proposed methods are from twice to four times faster than the SOT and hundreds of times faster than UONB.
Download

Paper Nr: 1
Title:

Automatic Acoustic Diagnosis of Heartbeats

Authors:

Simone Mastrangelo and Stavros Ntalampiras

Abstract: Automatic identification of heart irregularities based on the respective acoustic emissions is a relevant research field which receives ever-increasing attention over the last years. Devices such as digital stethoscope and smartphones can record the heartbeat sounds and are easily accessible, making this method more appealing. This paper presents different automatic procedures to classify heartbeat sounds coming from such devices into five different labels: normal, murmur, extra heart sound, extrasystole and artifact so that even people without medical knowledge can detect heart irregularities. The data used in this paper come from two different datasets. The first dataset is collected through an iPhone application whereas the second one is collected from a digital stethoscope. To be able to classify heartbeat sounds, time and frequency domain features are extracted and modeled by different machine learning algorithms, i.e. k-NN, random forest, SVM and ANNs. We report the achieved performances and a thorough comparison.
Download

Area 5 - Multimedia Systems and Applications

Short Papers
Paper Nr: 7
Title:

Towards Automatic Detection and Quantification of Mildew on Grape Leaf Disks

Authors:

Razib Iqbal, Kyle Sargent and Laszlo Kovacs

Abstract: Downy and powdery mildews are the most serious diseases of the grapevine. A sustainable way to control these pathogens is the breeding and deployment of resistant grape cultivars. For breeding efforts to be effective, accurate quantification of the resistance phenotype is essential. In this paper, we present a computer-based image recognition, processing, and analysis technique for enhancing the detection and quantification of Plasmopara viticola and Erysiphe necator the causal agents of downy and powdery mildew, respectively. We propose a multi-step approach that utilizes background removal and Hue-Saturation-Value (HSV) masking as opposed to multi-faceted color channel breakdowns, photo texture evaluations, or classification-based algorithms for the detection of mildew. Our experimental results show that our method provides reliable results and fast performance.
Download

Paper Nr: 15
Title:

Hyperspectral Methods in Microscopy Image Analysis: A Survey

Authors:

Shirin Nasr-Esfahani, Venkatesan Muthukumar, Emma E. Regentova, Kazem Taghva and Mohamed B. Trabia

Abstract: Hyperspectral imaging (HSI) has found applications in remote sensing, agriculture, medicine, and biology. HSI acquires a three-dimensional dataset called hypercube, with two spatial dimensions and one spectral dimension. Hyperspectral microscope imaging (HMI) is an emerging imaging spectroscopy technology, which combines the advantages of HSI with microscopic imaging; HSI provides rapid, nondestructive, and chemical free data analysis, whereas a microscope can be used to study microstructure of a sample such as nanoparticles. Integration of HSI and microscopy, results in nondestructive evaluation using both spatial and spectral information along with analysis at the molecular or cellular level. The aim of the survey is an overview of the recent applications for HMI in medicine and biology fields.
Download