Speech Emotion Recognition Using Convolutional Neural Networks


  • Aishwarya V
  • Faseeha Fathima J
  • Jagadale Rutuja T
  • Jaganathan K
  • Sangeetha Priya R


Speech emotion, Energy, Pitch, Librosa, Sklearn, Sound file, CNN, Spectrogram, MFCC.


Speech is the most natural and convenient ways by which humans communicate, and understanding speech is one of the most intricate processes that human brain performs. Speech Emotion Recognition (SER) aims to   recognize human emotion from speech. This is on the fact that voice often reflects underlying emotions through tone and pitch. The libraries used are Librosa for analyzing audio and music, sound file for reading and writing sampled sound file formats, sklearn for building the model. In the current study, the efficacy of Convolutional Neural Network (CNN) in recognition of speech emotions has been investigated.  Spectrograms of the speech signals are used as the input features of the networks. Mel-Frequency Cepstral Coefficients (MFCC) is used to extract features from audio. Our own speech dataset is used to train and evaluate our models. Based on the evaluation, the emotions (happy, sad, angry, neutral, surprised, disgust) of the speech will be detected.


Download data is not yet available.




How to Cite

Aishwarya V, Faseeha Fathima J, Jagadale Rutuja T, Jaganathan K, & Sangeetha Priya R. (2021). Speech Emotion Recognition Using Convolutional Neural Networks. International Journal of Progressive Research in Science and Engineering, 2(3), 25–29. Retrieved from https://journal.ijprse.com/index.php/ijprse/article/view/237