Fusing information from multiple sources for Multimodal Emotion Recognition

Published in Manuscript, 2023

This paper proposes a new framework for multimodal emotion recognition, finding that Tensor Fusion is the most effective technique for fusing features extracted from audio (CLAP), video (VideoMAE), and text (MPNet).

Download Paper

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...