Fusing information from multiple sources for Multimodal Emotion Recognition
Published in Manuscript, 2023
This paper proposes a new framework for multimodal emotion recognition, finding that Tensor Fusion is the most effective technique for fusing features extracted from audio (CLAP), video (VideoMAE), and text (MPNet).

Leave a Comment
Your email address will not be published. Required fields are marked *