A Multimodal AI-Based Interview Assessment System Using Facial Emotion Recognition and Speech Confidence Analysis
Shatayu Someshwarrao Balapure
Interview performance depends not only on domain knowledge but also on non-verbal communication, emotional control, and vocal confidence—factors often overlooked by con- ventional mock interview platforms that primarily assess textual correctness. This paper presents an intelligent AI-powered mock interview system designed for holistic evaluation of both tech- nical knowledge and soft skills. The proposed system adopts a multimodal framework that integrates Natural Language Pro- cessing (NLP) for semantic answer evaluation, facial emotion recognition for analyzing non-verbal cues, and speech analysis for assessing vocal confidence. Facial emotions are predicted using a Convolutional Neural Network (CNN) trained on the FER-2013 dataset, while speech confidence is evaluated through acoustic feature extraction and classification. A hybrid semantic similarity and keyword-matching approach is used to assess answer relevance. By jointly analyzing these modalities, the system simulates a realistic interview environment and generates a comprehensive performance report. Experimental evaluation shows that the emotion recognition model achieved 53.98The proposed approach enables automated, objective, and holistic interview feedback, making it a practical tool for candidate preparation and behavioral improvement. This work contributes toward the development of intelligent interview training systems that bridge the gap between technical evaluation and soft-skill assessment.

