Haiyue Song
I am a technical researcher at National Institute of Information and Communications Technology (NICT). I obtained my Ph.D. in Intelligence Science and Technology from Kyoto University.
My research focuses on machine translation and subword segmentation.
Email: haiyue.song at nict.go.jp
Education
Kyoto University
Supervised by Professor Sadao Kurohashi and Professor Chenhui Chu at Language Media Processing Lab
Ph.D. in Intelligence Science and Technology, October 2020–March 2024
Master in Intelligence Science and Technology, October 2018–September 2020
Shanghai Jiao Tong University
Supervised by Professor Li Jiang at Advanced Computer Architecture Lab
Bachelor of Computer Science and Technology, September 2014–July 2018
Minor in Japanese, School of Foreign Languages, February 2015–July 2018
Nagoya University
- Exchange student, October 2017–February 2018
Work Experience
National Institute of Information and Communications Technology
Supervised by Masao Utiyama, Hideki Tanaka and Raj Dabre at ASTREC
- Technical researcher, July 2023–present
- Research intern, October 2019–June 2023
JSPS Research Fellowship
- Research Fellowships for Young Scientists DC1, April 2021–June 2023
Kyoto University
- Research assistant at Kyoto University, November 2020–March 2021
LINE
- Internship at machine learning team, February 2019–March 2019
- Summary Report
Publication
[Google Scholar], [DBLP], [Research Gate], [ACL Profile]
Journal
Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, and Sadao Kurohashi. Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts. Journal of Information Processing. 2024 Volume 32 Pages 628–640. (JIP) [paper]
Haiyue Song, Zhuoyuan Mao, Raj Dabre, Chenhui Chu, and Sadao Kurohashi. DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation. Journal of Natural Language Processing. 2024 Volume 31 Issue 1 Pages 155–188. (JNLP) [paper], [bib]
Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, and Eiichiro Sumita. SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (2023.7). (TALLIP) [paper], [bib]
Weiqi Gu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Spatial Hierarchical Attention Network Based Video-guided Machine Translation. Journal of Information Processing, Vol.31, (2023.5). (JIP) [paper], [bib]
Li Jiang, Zhuoran Song, Haiyue Song, Chengwen Xu, Qiang Xu, Naifeng Jing, Weifeng Zhang, and Xiaoyao Liang. Energy-Efficient and Quality-Assured Approximate Computing Framework Using a Co-Training Method. ACM Transactions on Design Automation of Electronic Systems (TODAES), pp.59:1-59:25, (2019.11). [paper], [bib]
arXiv
- David Romero … Haiyue Song … et al. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. [paper], [web], [data]
International Conference
Aditya Joshi, Diptesh Kanojia, Heather Lent, Hour Kaing, and Haiyue Song. (Tutorial) Connecting Ideas in ‘Lower-Resource’ Scenarios: NLP for National Varieties, Creoles, and Other Low-resource Scenarios. Accepted to COLING 2025.
Raj Dabre, Haiyue Song (equal contribution), Miriam Exel, Bianka Buschbec, Johannes Eschbach-Dymanus, Hideki Tanaka. How Effective is Synthetic Data and Instruction Fine-tuning for Translation with Markup using LLMs? Accepted to The Conference of the Association for Machine Translation in the Americas 2024 (AMTA2024).
Raj Dabre, Haiyue Song (equal contribution). NICT’s Cascaded and End-To-End Speech Translation Systems using Whisper and IndicTrans2 for the Indic Task. Accepted to The International Conference on Spoken Language Translation (IWSLT 2024). Ranked 1st out of 4 teams.
Haiyue Song, Francois Meyer, Raj Dabre, Hideki Tanaka, Chenhui Chu, and Sadao Kurohashi. SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation. Accepted to The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024).
Haiyue Song, Hour Kaing, and Raj Dabre. Linguistically Motivated Neural Machine Translation. (Tutorial) Accepted to The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024). [repo]
Abhisek Chakrabarty, Haiyue Song, Raj Dabre, Hideki Tanaka, Masao Utiyama. Incorporating Hypernym Features for Improving Low-resource Neural Machine Translation. Accepted to First Workshop on Knowledge-Enhanced Machine Translation (KEMT 2024) which is co-located EAMT 2024.
Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka. NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Also won the best paper award at AfricaNLP 2024.
Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara. Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks. The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS2024), Sapporo, Japan. [paper], [code],
Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi. GPT-RE: In-context Learning for Relation Extraction using Large Language Models. EMNLP2023. [paper], [bib]
Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1300–1316, Toronto, Canada. Association for Computational Linguistics. (ACL2023) [paper], [bib]
Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi. Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation. Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation) held in conjection with EAMT2023. [paper], [bib]
Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi. Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision. In Findings of the Association for Computational Linguistics: EACL2023, pages 2580–2585, Dubrovnik, Croatia. Association for Computational Linguistics. [paper], [bib]
Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi. BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 85-94, Online. (AACL2022) [paper], [poster], [bib]
Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi. When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation? In Findings of the Association for Computational Linguistics: NAACL2022, pages 1766–1775, Seattle, United States. Association for Computational Linguistics. (2022) [paper], [bib]
Weiqi Gu, Haiyue Song, Chenhui Chu and Sadao Kurohashi. Video-guided Machine Translation with Spatial Hierarchical Attention Network. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Online (ACL2021 Student Workshop). [paper], [bib]
Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong (in alphabetical order).
A System for Worldwide COVID-19 Information Aggregation. NLP-COVID19@ACL2020 and NLP-COVID19 (part2)@EMNLP2020 [paper], [code], [dataset], [bib]Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi and Eiichiro Sumita. Pre-training via Leveraging Assisting Languages for Neural Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp.279-285, Seattle, Washington, United States, (2020, 7). (ACL2020SRW) [paper], [arXiv version paper], [slides], [bib]
Haiyue Song, Raj Dabre, Atsushi Fujita and Sadao Kurohashi. Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation. Proceedings of the 12th International Conference on Language Resources and Evaluation, pp.3640‑3649, Marseille, France, (2020.5). (LREC2020) [code], [paper], [bib]
Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song and Sadao Kurohashi. JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. Proceedings of the 12th International Conference on Language Resources and Evaluation, pp.3683‑3691, Marseille, France, (2020.5). (LREC2020) [paper], [bib]
Haiyue Song, Chengwen Xu, Qiang Xu, Zhuoran Song, Naifeng Jing, Xiaoyao Liang, Li Jiang. Invocation-driven neural approximate computing with a multiclass-classifier and multiple approximators. In Proceedings of the International Conference on Computer-Aided Design, pp.50, San Diego, CA, USA, (2018.11). (ICCAD2018) [paper], [slides], [bib]
Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang. A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks (Abstract Only). In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp.286, Monterey, CA, USA, (2018.2). (FPGA2018) [abstract], [poster], [bib]
Domestic Conference (non peer-reviewed)
Hour Kaing, Chenchen Ding, Haiyue Song, Jiannan Mao, Hideki Tanaka, and Masao Utiyama. Robust Neural Machine Translation for Abugidas by Glyph Perturbation 言語処理学会 第30回年次大会, (2024.3).
Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi. Large Pre-trained Language Models with Multilingual Prompt for Japanese Natural Language Tasks 言語処理学会 第29回年次大会, 沖縄, (2023.3)
Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi. Representative Data Selection for Sequence-to-Sequence Pre-training 言語処理学会 第28回年次大会, pp.1-5, (2022.3)
Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Sadao Kurohashi. Improving Medical Relation Extraction with Distantly Supervised Pre-training, 言語処理学会 第28回年次大会, 浜松, (2022.3)
Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, and Eiichiro Sumita. Self-supervised Dynamic Programming Encoding for Neural Machine Translation 言語処理学会 第27回年次大会, 北九州, (2021.3).
Weiqi Gu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Video-guided Machine Translation with Spatial Hierarchical Attention Network Encoder 言語処理学会 第27回年次大会, 北九州, (2021.3).
Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi.
Domain Adaptation of Neural Machine Translation through Multistage Fine-Tuning
言語処理学会第26回年次大会, pp.461-464, 茨城, (2020.3).Zhuoyuan Mao, Raj Dabre, Fabien Cromieres, Haiyue Song, 中尾 亮太, 黒橋 禎夫.
ニューラル機械翻訳のための言語知識に基づくマルチタスク事前学習
言語処理学会第26回年次大会, pp.1061-1064, 茨城, (2020.3).
Other Presentations
Presentation at 第14回入力メソッドワークショップ (IM 2022) [link]
Presentation at 2022年度 京都大学 情報学研究科 知能情報学専攻 シンポジウム
Lecture at 知能情報学演習
Other Activities
- Reviewer of TASLP2024, ARR2024, TALLIP2024, TALLIP2023, ARR2023, APSIPA ASC2023, EMNLP2023, ACL2023, EMNLP2022, EMNLP2021, EMNLP2020, IJCNLP2020, WAT2020.
- Co-organizer of WAT2024.
- Mentor of AACL2020-SRW.
- One patent application in progress.
Hobbies
Competitive Programming
- Silver medal in National Olympiad in Informatics (NOI) 2013, Chengdu, China.
- Bronze medal in Asia-Pacific Informatics Olympiad (APIO) 2013.
- Atcoder 2 Kyu level, [profile]
Japanese
- Passed Japanese Language Proficiency Test N1 (the highest level) in 2017.
Sports
- Marathon. Finisher of the full course of the Kyoto Marathon 2023 and Biwako Marathon 2024
- Mountain Climbing. Mount Kita, Mount Yake and some famous mountains in Japan.
- Ski. This winter (24–25) you can find me in Takasu Mountains. Last year: Shiga kogen and Niseko.