“Did you laugh enough today?” – Deep Neural Networks for Mobile and Wearable Laughter Trackers
In this paper we describe a mobile and wearable devices app that recognises laughter from speech in real-time. The laughter detection is based on a deep neural network architecture, which runs smoothly and robustly, even natively on a smartwatch. Further, this paper presents results demonstrating that our approach achieves state-of-the-art laughter detection performance on the SSPNet Vocalization Corpus (SVC) from the 2013 Interspeech Computational Paralinguistics Challenge Social Signals Sub-Challenge. As this technology is tailored for mobile and wearable devices, it enables and motivates many new use cases, for example, deployment in health care settings such as laughter tracking for psychological coaching, depression monitoring, and therapies.
This paper was accepted as Show and Tell Demonstration at the Interspeech conference 2017.
Enhancing LSTM RNN-based Speech Overlap Detection by Artificially Mixed Data
This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers’ gender is recognised, as the first successful combination of this kind within one model.
This paper was accepted at AES Semantic audio conference 2017.
Combined Speech Activity and Speaker Overlap Detection with GPU Accelerated Long Short-Term Memory Recurrent Neural Networks
This thesis discusses the topic of detecting overlap in speech, i.e. the presence of two speakers at the same time in a mono audio signal. The motivation therefore is not only to improve speaker diarization, but also to address problems like emotion recognition and the analysis of conversational dynamics. To achive that goal, the machine learning technique of Long Short-Term Memory Recurrent Neural Networks is utilized. Therefore the complete process is layed out in detail from comprehensive data collection and generation to model training and optimization. Extensive experiments are carried out that give deep insight into the relations of Neural Network structures and their respective performances for the recognition of overlap, voice activity and gender. A comparison with state of the art research clearly underlines the success of this method, which delivers the best models for overlap recognition within the body of research to the author’s best knowledge.
Natural Language Processing Methoden für die Zukunft des Rechts
Nach einer Einführung in Grundbegriffe der rechnergestützten Verarbeitung von natürlicher Sprache (Natural Language Processing) werden zwei Anwendungsbeispiele dafür im Recht gezeigt. Während eines Forschungsbezug hat und Aussagen in Gesetzestexten automatisch Kategorien wie Definitionen oder Arten von Normen zuordnet, hat das andere Praxisbezug und erzeugt automatisch sinnvolle Zusammenfassungen und Indexierung von Gerichtsurteilen. Reguläre Ausdrücke spielen hierbei eine hervorzuhebende Rolle, was auf hohe, scheinbar sprachunabhängige Regelmäßigkeit der Rechtssprache zurückgeführt wird. Die Bedeutung dessen für die Zukunft des Rechts wird hervorgehoben.
This work was conducted for the seminar “Text Mining and Artificial Intelligence for the Future of Jurisdiction” in 2015 at Technische Universität München and was graded with 1.3 (very good in German grading system).
Download Seminar Thesis
Augmenting Emotions from Speech with Generative Music:
Emotion Transformation & Prototype Evaluation
The present essay is an activity report about an interdisciplinary student project concerned with a software prototype recognizing affect from human speech and transforming it to congruently perceived generative music. Besides motivating relevant use cases and research from domains like medical sciences and human computer interaction, an emotion recognition and processing pipeline is proposed to map recognized emotions into the Circumplex Model as a pre-stage for music generation. For the latter, existing music algorithms and related research are outlined and an object oriented implementation demonstrated, which is capable of composing and playing music dynamically according to according emotional information. The complete prototype, including emotion recognition from speech plus music generation and all steps in between, was evaluated in a user study. Its results are outlined at the end of this work. They strongly indicate that music created by the utilized approach is perceived as emotionally similar to affective speech. Concludingly, further necessary work is discussed to make the prototype ready for use in real world scenarios, that are especially related to research in medical and psychological domains as well as solutions from media industry for supportive musical accompaniment of artistical verbal stages.
Augmenting Affect from Speech with Generative Music
Abstract: In this work we propose a prototype to improve interpersonal communication of emotions. Therefore music is generated on the fly with the same affect as when humans talk. Emotions in speech are detected and conveyed to music according to music psychological rules. Existing evaluated modules from affective generative music and speech emotion detection, use cases, emotional models and projected evaluations are discussed.
The regarding paper was accepted as Work-In-Progress poster presentation at ACM SIGCHI 2015 conference:
Evaluation of Prototype Version 0.1
Demo 1 (high arousal & high valence)
Demo 2 (high arousal & low valence)
FugueGenerator – Collaborative Melody Composition Based on a Generative Approach for Conveying Emotion in Music
Abstract: This paper exempliﬁes an approach for generative music software. Therefore new operational principles are used, i.e. drawing melody contours and changing their emotional expression by making use of valence and arousal. Known connections between music, emotions and algorithms out of existing literature are used to deliver a software experience that augments the skills of individuals to make music according to the emotions they want. A user-study lateron shows the soundness of the implementation in this regard. A comprehensive analysis of in-game statistics makes it possible to measure the music produced by testers so that connections between valence, arousal, melody properties and contours and emotions will be depicted. In addition, temporal sequences of reaction patterns between musicmaking individuals during their creative interaction are made visible. A questionnaire ﬁlled out by the testers puts the results on a solid foundation and shows that the incorporated methods are appreciated by the users to apply emotions musically as well as for being creative in a free and joyful manner.
The regarding paper was accepted for poster presentation at ICMC-SMC 2014 conference:
TreeQuencer: Collaborative Rhythm Sequencing – A Comparative Study
Abstract: In this contribution we will show three prototypical applications that allow users to collaboratively create rhythmic structures with successively more degrees of freedom to generate rhythmic complexity. By means of a user study we analyze the impact of this on the users’ satisfaction and further compare it to data logged during the experiments that allow us to measure the rhythmic complexity created.