課程簡(jiǎn)介
案例背景:
對(duì)于跨模態(tài)信息的處理是目前很多推薦、廣告、檢索等智能系統(tǒng)的核心問(wèn)題之一,尤其是針對(duì)跨模態(tài)的問(wèn)答與對(duì)話技術(shù)。我們將討論如何應(yīng)用深度學(xué)習(xí)模型對(duì)視覺(jué)問(wèn)答系統(tǒng)于視覺(jué)對(duì)話系統(tǒng)提出新的技術(shù)解決方案。
解決思路:
我們將討論利用多模態(tài)信息的融合、圖卷積模型來(lái)設(shè)計(jì)相應(yīng)的對(duì)話系統(tǒng)與問(wèn)答系統(tǒng)。并考慮如何更好的利用知識(shí)圖譜與先驗(yàn)的關(guān)系信息。
成果:
研究成果已經(jīng)發(fā)布了國(guó)際同行認(rèn)可的頂級(jí)期刊和會(huì)議。期待在實(shí)際工業(yè)落地的應(yīng)用。
Jing Yu, Weifeng Zhang, Yuhang Lu, Zengchang Qin, Yue Hu, Jianlong Tan, Qi Wu (2020), Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval, IEEE Transaction on Multimedia (IF=5.452).
3. Weifeng Zhang, Jing Yu, Hua Hu, Haiyang Hu, Zengchang Qin (2020), Multimodal feature fusion by relational reasoning and attention for visual question answering, Information Fusion (IF=10.716), Vol. 55: pp. 116-126.
Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu and Qi Wu (2020), DualVD: An adaptive dual encoding model for deep visual understanding in visual dialogue, Proceedings of National Conference on Artificial Intelligence (AAAI-2020)
目標(biāo)收益
1 了解視覺(jué)問(wèn)答系統(tǒng)技術(shù)的發(fā)展與前沿 2 了解最新關(guān)于視覺(jué)對(duì)話的研究工作 3 了解設(shè)計(jì)智能對(duì)話和問(wèn)答系統(tǒng)的核心算法。
培訓(xùn)對(duì)象
課程內(nèi)容
案例方向
智能語(yǔ)音/NLP/推薦/廣告系統(tǒng)實(shí)戰(zhàn)/計(jì)算機(jī)視覺(jué)
案例背景
對(duì)于跨模態(tài)信息的處理是目前很多推薦、廣告、檢索等智能系統(tǒng)的核心問(wèn)題之一,尤其是針對(duì)跨模態(tài)的問(wèn)答與對(duì)話技術(shù)。我們將討論如何應(yīng)用深度學(xué)習(xí)模型對(duì)視覺(jué)問(wèn)答系統(tǒng)于視覺(jué)對(duì)話系統(tǒng)提出新的技術(shù)解決方案。
收益
1 了解視覺(jué)問(wèn)答系統(tǒng)技術(shù)的發(fā)展與前沿 2 了解最新關(guān)于視覺(jué)對(duì)話的研究工作 3 了解設(shè)計(jì)智能對(duì)話和問(wèn)答系統(tǒng)的核心算法。
解決思路
我們將討論利用多模態(tài)信息的融合、圖卷積模型來(lái)設(shè)計(jì)相應(yīng)的對(duì)話系統(tǒng)與問(wèn)答系統(tǒng)。并考慮如何更好的利用知識(shí)圖譜與先驗(yàn)的關(guān)系信息。
結(jié)果
研究成果已經(jīng)發(fā)布了國(guó)際同行認(rèn)可的頂級(jí)期刊和會(huì)議。期待在實(shí)際工業(yè)落地的應(yīng)用。
Jing Yu, Weifeng Zhang, Yuhang Lu, Zengchang Qin, Yue Hu, Jianlong Tan, Qi Wu (2020), Reasoning on the relation: enhancing visual representation for visual question answering and cross-modal retrieval, IEEE Transaction on Multimedia (IF=5.452).
3. Weifeng Zhang, Jing Yu, Hua Hu, Haiyang Hu, Zengchang Qin (2020), Multimodal feature fusion by relational reasoning and attention for visual question answering, Information Fusion (IF=10.716), Vol. 55: pp. 116-126.
Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang, Xingxing Zhang, Yue Hu and Qi Wu (2020), DualVD: An adaptive dual encoding model for deep visual understanding in visual dialogue, Proceedings of National Conference on Artificial Intelligence (AAAI-2020)