Impact of Dimension Reduction Techniques on the Accuracy of Speech Emotion Recognition
Keywords:
SER, feature selection, feature reduction, LSTM, CNN, DAAbstract
Dimensionality reduction techniques play an important role in the accuracy of speech emotion recognition (SER). This research focuses on the utilization of feature selection (FS) and feature reduction (FR) techniques for SER. The proposed approach introduces a ConvLSTM model that combines convolutional neural networks (CNN) and long short-term memory (LSTM) networks to improve the accuracy of SER. . The study investigates the effect of the four Dimensional Reduction (DR) techniques chosen for this experiment. Extracted four acoustic features Mel-frequency cepstral coefficients (MFCC), Mel-spectrogram, Chromagram, Root Mean Square (RMS). Data augmentation (DA) techniques are applied to enhance the model's robustness and introduce variations to the training data. The effectiveness of FS and FR techniques is evaluated using three widely employed audio datasets. The experimental results demonstrate the superiority of the proposed approach over previous studies. The correlation-based feature selection (CFS) technique achieved the highest accuracy rates of 97.22% for RAVDESS, 96.65% for SAVEE, and 97.75% for EMO-DB. Similarly, in the 4-fold cross-validation, CFS achieved high accuracy rates ranging from 97.11% to 98.39%. Additionally, in FR techniques was the Principal Component Analysis (PCA) technique performed well for feature reduction. The results underscore the importance of selecting appropriate feature selection and reduction techniques based on factors such as dataset type, size, and compatibility with subsequent models.