Multimodal language analysis helps to understand human communication by integrating information from multiple modalities. However, previous studies simply concatenated features to identify language impairment in speech data, which lacks comprehension of the complex connections between modalities.
Individuals with language disorder often rely on non-verbal communication techniques, especially gestures, as an additional communication tool due to difficulties in word retrieval and language errors. Therefore, the same word can be interpreted differently depending on the accompanying gestures for different language disorder symptoms. Hence, utilizing both speech (i.e., linguistic and acoustic) and gesture (i.e., visual) information is crucial in understanding language disorder’s characteristics.
Research Goal
We aims to establish healthcare systems based on understanding language disorder’s characteristics by utilizing both speech (i.e., linguistic and acoustic) and gesture (i.e., visual) information.
Approach
Understanding Co-Speech Gestures for Aphasia Type Detection(Lee** et al., 2023):
Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal graph neural network for aphasia type detection using speech and corresponding gesture patterns. We show that gesture features outperform acoustic features, highlighting the significance of gesture expression in detecting aphasia types.
We collected vlogs from YouTube and annotated them into depression and non-depression. Based on analysis of the statistical differences between depression and non-depression vlogs, we build a depression detection model that learns both audio and visual features, achieving high accuracy.
I am actively involved in this project related to mild cognitive impairment detection with domain experts, such as pathologists and healthcare practitioners at the University of South Florida (USF). To understand the common characteristics of people with MCI speaking different languages, we propose a multilingual MCI detection model using multimodal approaches that analyze both acoustic and linguistic features. It outperforms existing machine learning models by identifying universal MCI indicators across languages.
References
2024
Multilingual Mild Cognitive Impairment Detection with Multimodal Approach
Benjamin Barrera-Altuna , Daeun Lee , Zaima Zarnaz , Jinyoung Han , and Seungbae Kim**
Mild cognitive impairment (MCI) and dementia significantly impact millions worldwide and rank as a major cause of mortality. Since traditional diagnostic methods are often costly and result in delayed diagnoses, many efforts have been made to propose automatic detection approaches. However, most methods focus on monolingual cases, limiting the scalability of their models to individuals speaking different languages. To understand the common characteristics of people with MCI speaking different languages, we propose a multilingual MCI detection model using multimodal approaches that analyze both acoustic and linguistic features. It outperforms existing machine learning models by identifying universal MCI indicators across languages. Particularly, we find that speech duration and pauses are crucial in detecting MCI in multilingual settings. Our findings can potentially facilitate early intervention in cognitive decline across diverse linguistic backgrounds.
2023
Learning Co-Speech Gesture for Multimodal Aphasia Type Detection
Daeun Lee** , Sejung Son** , Hyolim Jeon , Seungbae Kim , and Jinyoung Han*
Aphasia, a language disorder resulting from brain damage, requires accurate identification of specific aphasia types, such as Broca’s and Wernicke’s aphasia, for effective treatment. However, little attention has been paid to developing methods to detect different types of aphasia. Recognizing the importance of analyzing co-speech gestures for distinguish aphasia types, we propose a multimodal graph neural network for aphasia type detection using speech and corresponding gesture patterns. By learning the correlation between the speech and gesture modalities for each aphasia type, our model can generate textual representations sensitive to gesture information, leading to accurate aphasia type detection. Extensive experiments demonstrate the superiority of our approach over existing methods, achieving state-of-the-art results (F1 84.2%). We also show that gesture features outperform acoustic features, highlighting the significance of gesture expression in detecting aphasia types. We provide the codes for reproducibility purposes.
Detecting depression on video logs using audiovisual features
Kyungeun Min , Jeewoo Yoon , Migyeong Kang , Daeun Lee , Eunil Park , and 1 more author
Humanities and Social Sciences Communications, Nov 2023
Detecting depression on social media has received significant attention. Developing a depression detection model helps screen depressed individuals who may need proper treatment. While prior work mainly focused on developing depression detection models with social media posts, including text and image, little attention has been paid to how videos on social media can be used to detect depression. To this end, we propose a depression detection model that utilizes both audio and video features extracted from the vlogs (video logs) on YouTube. We first collected vlogs from YouTube and annotated them into depression and non-depression. We then analyze the statistical differences between depression and non-depression vlogs. Based on the lessons learned, we build a depression detection model that learns both audio and visual features, achieving high accuracy. We believe our model helps detect depressed individuals on social media at an early stage so that individuals who may need appropriate treatment can get help.