Fusing information from multiple sources for Multimodal Emotion Recognition

Published in Manuscript, 2023

This paper proposes a new framework for multimodal emotion recognition, finding that Tensor Fusion is the most effective technique for fusing features extracted from audio (CLAP), video (VideoMAE), and text (MPNet).