Speech Emotion Recognition Using Convolutional Neural Networks
Keywords:
Speech emotion, Energy, Pitch, Librosa, Sklearn, Sound file, CNN, Spectrogram, MFCC.Abstract
Speech is the most natural and convenient ways by which humans communicate, and understanding speech is one of the most intricate processes that human brain performs. Speech Emotion Recognition (SER) aims to recognize human emotion from speech. This is on the fact that voice often reflects underlying emotions through tone and pitch. The libraries used are Librosa for analyzing audio and music, sound file for reading and writing sampled sound file formats, sklearn for building the model. In the current study, the efficacy of Convolutional Neural Network (CNN) in recognition of speech emotions has been investigated. Spectrograms of the speech signals are used as the input features of the networks. Mel-Frequency Cepstral Coefficients (MFCC) is used to extract features from audio. Our own speech dataset is used to train and evaluate our models. Based on the evaluation, the emotions (happy, sad, angry, neutral, surprised, disgust) of the speech will be detected.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Aishwarya V, Faseeha Fathima J, Jagadale Rutuja T, Jaganathan K, Sangeetha Priya R
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.