马杰  (副教授)

博士生导师 硕士生导师

电子邮箱:

所在单位:网络空间安全学院

学历:博士研究生毕业

办公地点:

性别:男

联系方式:

学位:博士

学科:网络空间安全

   

我的新闻

当前位置: 中文主页 >> 我的新闻

本人关于鲁棒音视频问答的论文被NeurIPS 2024接收

点击次数:

发布时间:2024-09-26

发布时间:2024-09-26

文章标题:本人关于鲁棒音视频问答的论文被NeurIPS 2024接收

内容:

  • 会议介绍:NeurIPS,全称神经信息处理系统大会,是一个关于机器学习和计算神经科学的国际会议。该会议固定在每年的12月举行,由NIPS基金会主办。NIPS是机器学习领域的顶级会议 。在中国计算机学会、中国人工智能学会的国际学术会议排名中,NeurIPS为人工智能领域的A类会议。
  • 论文题目:Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
  • 论文摘要:Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackle these challenges, firstly, we propose a novel dataset, \textit{MUSIC-AVQA-R}, crafted in two steps: rephrasing questions within the test split of a public dataset (\textit{MUSIC-AVQA}) and subsequently introducing distribution shifts to split questions. The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions. Secondly, we propose a robust architecture that utilizes a multifaceted cycle collaborative debiasing strategy to overcome bias learning. Experimental results show that this architecture achieves state-of-the-art performance on both datasets, especially obtaining a significant improvement of 9.32\% on the proposed dataset. Extensive ablation experiments are conducted on these two datasets to validate the effectiveness of the debiasing strategy. Additionally, we highlight the limited robustness of existing multi-modal QA methods through the evaluation on our dataset.
  • 论文代码:https://github.com/reml-group/MUSIC-AVQA-R.

上一条: 受邀担任人工智能与统计国际会议(AISTATS)审稿人

下一条: 受邀担任国际表征学习大会(ICLR)2025审稿人