Haiyue Song

I am currently a technical researcher at National Institute of Information and Communications Technology (NICT).

I obtained my Ph.D. in Intelligence Science and Technology from Kyoto University.

My research interests include machine translation (MT) and large language models (LLMs). Recently I focus on low-resource MT, subword segmentation, decoding algorithms, and LLMs for MT.

Email: haiyue.song at nict.go.jp

[Publication]



Education

Kyoto University

Supervised by Professor Sadao Kurohashi and Professor Chenhui Chu at Language Media Processing Lab

  • Ph.D. of Intelligence Science and Technology, October 2020–March 2024

  • Master of Intelligence Science and Technology, October 2018–September 2020

Shanghai Jiao Tong University

Supervised by Professor Li Jiang at Advanced Computer Architecture Lab

  • Bachelor of Computer Science and Technology, September 2014–July 2018

  • Minor in Japanese, School of Foreign Languages, February 2015–July 2018

Nagoya University

  • Exchange student, October 2017–February 2018



Work Experience

National Institute of Information and Communications Technology

Supervised by Masao Utiyama, Hideki Tanaka and Raj Dabre at ASTREC

  • Technical researcher, July 2023–present
  • Research internship, October 2019–June 2023

JSPS Research Fellowship

SenseTime Japan

  • Internship at autonomous driving team, August 2022–September 2022

Kyoto University

  • Research assistant at Kyoto University, November 2020–March 2021

LINE

  • Internship at machine learning team, February 2019–March 2019
  • Summary Report



Publication

[Google Scholar], [DBLP], [Research Gate], [ACL Profile]

Journal

  • Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, and Sadao Kurohashi. Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts. Accepted to Journal of Information Processing (JIP).

  • Haiyue Song, Zhuoyuan Mao, Raj Dabre, Chenhui Chu, and Sadao Kurohashi. DiverSeg: Leveraging Diverse Segmentations with Cross-granularity Alignment for Neural Machine Translation. Journal of Natural Language Processing. 2024 Volume 31 Issue 1 Pages 155–188. (JNLP) [paper], [bib]

  • Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, and Eiichiro Sumita. SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (2023.7). (TALLIP) [paper], [bib]

  • Weiqi Gu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Spatial Hierarchical Attention Network Based Video-guided Machine Translation. Journal of Information Processing, Vol.31, (2023.5). (JIP) [paper], [bib]

  • Li Jiang, Zhuoran Song, Haiyue Song, Chengwen Xu, Qiang Xu, Naifeng Jing, Weifeng Zhang, and Xiaoyao Liang. Energy-Efficient and Quality-Assured Approximate Computing Framework Using a Co-Training Method. ACM Transactions on Design Automation of Electronic Systems (TODAES), pp.59:1-59:25, (2019.11). [paper], [bib]

International Conference

  • Haiyue Song, Francois Meyer, Raj Dabre, Hideki Tanaka, Chenhui Chu, and Sadao Kurohashi. SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation. Accepted to The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024).

  • Haiyue Song, Hour Kaing, and Raj Dabre. Linguistically Motivated Neural Machine Translation. (Tutorial) Accepted to The 25th Annual Conference of the European Association for Machine Translation (EAMT 2024).

  • Francois Meyer, Haiyue Song, Abhisek Chakrabarty, Jan Buys, Raj Dabre and Hideki Tanaka. NGLUEni: Benchmarking and Adapting Pretrained Language Models for Nguni Languages. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Also won the best paper award at AfricaNLP 2024.

  • Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara. Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks. The 14th International Workshop on Spoken Dialogue Systems Technology (IWSDS2024), Sapporo, Japan. [paper], [code],

  • Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Jiwei Li, Sadao Kurohashi. GPT-RE: In-context Learning for Relation Extraction using Large Language Models. EMNLP2023. [paper], [bib]

  • Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1300–1316, Toronto, Canada. Association for Computational Linguistics. (ACL2023) [paper], [bib]

  • Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi. Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation. Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation (Multi3Generation) held in conjection with EAMT2023. [paper], [bib]

  • Zhen Wan, Fei Cheng, Qianying Liu, Zhuoyuan Mao, Haiyue Song and Sadao Kurohashi. Relation Extraction with Weighted Contrastive Pre-training on Distant Supervision. In Findings of the Association for Computational Linguistics: EACL2023, pages 2580–2585, Dubrovnik, Croatia. Association for Computational Linguistics. [paper], [bib]

  • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi. BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 85-94, Online. (AACL2022) [paper], [poster], [bib]

  • Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi. When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation? In Findings of the Association for Computational Linguistics: NAACL2022, pages 1766–1775, Seattle, United States. Association for Computational Linguistics. (2022) [paper], [bib]

  • Weiqi Gu, Haiyue Song, Chenhui Chu and Sadao Kurohashi. Video-guided Machine Translation with Spatial Hierarchical Attention Network. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Online (ACL2021 Student Workshop). [paper], [bib]

  • Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong (in alphabetical order).
    A System for Worldwide COVID-19 Information Aggregation. NLP-COVID19@ACL2020 and NLP-COVID19 (part2)@EMNLP2020 [paper], [code], [dataset], [bib]

  • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi and Eiichiro Sumita. Pre-training via Leveraging Assisting Languages for Neural Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp.279-285, Seattle, Washington, United States, (2020, 7). (ACL2020SRW) [paper], [arXiv version paper], [slides], [bib]

  • Haiyue Song, Raj Dabre, Atsushi Fujita and Sadao Kurohashi. Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation. Proceedings of the 12th International Conference on Language Resources and Evaluation, pp.3640‑3649, Marseille, France, (2020.5). (LREC2020) [code], [paper], [bib]

  • Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song and Sadao Kurohashi. JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation. Proceedings of the 12th International Conference on Language Resources and Evaluation, pp.3683‑3691, Marseille, France, (2020.5). (LREC2020) [paper], [bib]

  • Haiyue Song, Chengwen Xu, Qiang Xu, Zhuoran Song, Naifeng Jing, Xiaoyao Liang, Li Jiang. Invocation-driven neural approximate computing with a multiclass-classifier and multiple approximators. In Proceedings of the International Conference on Computer-Aided Design, pp.50, San Diego, CA, USA, (2018.11). (ICCAD2018) [paper], [slides], [bib]

  • Haiyue Song, Xiang Song, Tianjian Li, Hao Dong, Naifeng Jing, Xiaoyao Liang, Li Jiang. A FPGA Friendly Approximate Computing Framework with Hybrid Neural Networks (Abstract Only). In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp.286, Monterey, CA, USA, (2018.2). (FPGA2018) [abstract], [poster], [bib]

Domestic Conference (non peer-reviewed)

  • Hour Kaing, Chenchen Ding, Haiyue Song, Jiannan Mao, Hideki Tanaka, and Masao Utiyama. Robust Neural Machine Translation for Abugidas by Glyph Perturbation 言語処理学会 第30回年次大会, (2024.3).

  • Haiyue Song, Raj Dabre, Chenhui Chu and Sadao Kurohashi. Large Pre-trained Language Models with Multilingual Prompt for Japanese Natural Language Tasks 言語処理学会 第29回年次大会, 沖縄, (2023.3)

  • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Chenhui Chu and Sadao Kurohashi. Representative Data Selection for Sequence-to-Sequence Pre-training 言語処理学会 第28回年次大会, pp.1-5, (2022.3)

  • Zhen Wan, Fei Cheng, Zhuoyuan Mao, Qianying Liu, Haiyue Song, Sadao Kurohashi. Improving Medical Relation Extraction with Distantly Supervised Pre-training, 言語処理学会 第28回年次大会, 浜松, (2022.3)

  • Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, and Eiichiro Sumita. Self-supervised Dynamic Programming Encoding for Neural Machine Translation 言語処理学会 第27回年次大会, 北九州, (2021.3).

  • Weiqi Gu, Haiyue Song, Chenhui Chu, and Sadao Kurohashi. Video-guided Machine Translation with Spatial Hierarchical Attention Network Encoder 言語処理学会 第27回年次大会, 北九州, (2021.3).

  • Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi.
    Domain Adaptation of Neural Machine Translation through Multistage Fine-Tuning
    言語処理学会第26回年次大会, pp.461-464, 茨城, (2020.3).

  • Zhuoyuan Mao, Raj Dabre, Fabien Cromieres, Haiyue Song, 中尾 亮太, 黒橋 禎夫.
    ニューラル機械翻訳のための言語知識に基づくマルチタスク事前学習
    言語処理学会第26回年次大会, pp.1061-1064, 茨城, (2020.3).

Other Presentations

  • Presentation at 第14回入力メソッドワークショップ (IM 2022) [link]

  • Presentation at 2022年度 京都大学 情報学研究科 知能情報学専攻 シンポジウム

  • Lecture at 知能情報学演習

Miscellaneous

  • Reviewer of TASLP2024, ARR2024, TALLIP2024, TALLIP2023, ARR2023, APSIPA ASC2023, EMNLP2023, ACL2023, EMNLP2022, EMNLP2021, EMNLP2020, IJCNLP2020, WAT2020.
  • Mentor of AACL2020-SRW.
  • One patent application in progress.

Hobbies

Competitive Programming

  • Silver medal in National Olympiad in Informatics (NOI) 2013, Chengdu, China.
  • Bronze medal in Asia-Pacific Informatics Olympiad (APIO) 2013.
  • Atcoder 2 Kyu level, [profile]

Japanese

  • Passed Japanese Language Proficiency Test N1 (the highest level) in 2017.

Jogging/Hiking/Ski