Weerayut Buaphet

(+66) 80 6789 276 • weerayut.b_s20@vistec.ac.th • weerayutbu.github.io

Summary

I am a fifth-year Ph.D. student at the Natural Language Processing and Representation Learning (NRL) Lab at VISTEC, Thailand, under the supervision of Assoc. Prof. Dr. Sarana Nutanong and co-supervision of Assoc. Prof. Dr. Attapol Rutherford.

My research focuses on information extraction tasks, Named Entity Recognition (NER), and Representation Learning. My work aims to address the challenges in NER, including limited resources for Thai NER, issues related to open class problems with unseen and long-tail entities, and multilingual and domain-specific. My co-authors and I have previously worked on developing a Thai Fine-grained Nested NER dataset to bridge the gap between low-resource and high-resource languages. Additionally, we have explored few-shot learning techniques, leveraging large language models to generate relevant examples and enhance the effectiveness of few-shot NER.

Currently, I am focusing on creating a bilingual finance-NER dataset in Thai and English to study knowledge transfer from high-resource to low-resource languages.

Education

Ph.D. in Information Science and Technology

Vidyasirimedhi Institute of Science and Technology (VISTEC), Aug 2020 - Present

GPA: 4.00/4.00

Relevant coursework: Natural Language Processing, Computational Machine Intelligence and Applications

B.Eng. in Computer Engineering

Rajamangala University of Technology Lanna (RMUTL-CM), Mar 2016 - Mar 2020

GPA: 3.62/4.00 (Top 1)

Relevant coursework: Data Structures and Algorithms, Operating Systems, Software Engineering

Technical Skills
Languages
Tools and Frameworks
Internship

VISTEC, Rayong, Thailand: Researcher Assistant

Nov 2019 – Aug 2020

  • Conducted literature review and implemented a baseline Thai Nested Named Entity Recognition model using Python and Torch. Performed quality control and error analysis to ensure the accuracy and reliability of the Thai N-NER model.
Academic Projects

Thai Nested Named Entity Recognition Corpus.

Weerayut Buaphet, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Attapol Rutherford, and Sarana Nutanong. 2022. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1473–1486, Dublin, Ireland. Association for Computational Linguistics.

Aug 2019 – May 2022

Mitigating Spurious Correlation in Natural Language Understanding with Counterfactual Inference.

Can Udomcharoenchaikit, Wuttikorn Ponwitayarat, Patomporn Payoungkhamdee, Kanruethai Masuk, Weerayut Buaphet, Ekapol Chuangsuwanich, and Sarana Nutanong. 2022. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11308–11321, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Dec 2022

Cross-Lingual Data Augmentation For Thai Question-Answering.

Parinthapat Pengpun, Can Udomcharoenchaikit, Weerayut Buaphet, and Peerat Limkonchotiwat. 2023. In Proceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, pages 193–203, Singapore. Association for Computational Linguistics.

Dec 2023

Few-shot Named Entity Recognition

In this project, we focus on leveraging the capabilities of large language models to generate relevant examples, thereby enhancing the effectiveness of few-shot NER.

Under review

Bi-lingual Named Entity Recognition for financing

Creating a finance-NER dataset composed of two languages, Thai and English. This project aims to study the knowledge transfer from high-resource to low-resource languages in the financial domain.

Ongoing project

OPEN SOURCE CONTRIBUTIONS AND ACTIVITIES
Reviewers

• 2024: ARR-EMNLP

Open Source NER Models

Part of the development team for Thai NER datasets and models, including a nested NER model for fine-grained classification and a bilingual (TH-EN) NER model for the financial domain

Mentor AI Builders (2022)

This program aims to develop knowledge in Data Science and Artificial Intelligence (AI) for middle and high school students interested in practical applications. My friend and I are responsible for five students working on various projects, such as question generation, fake news detection, and creating a dataset and system to support a plant tissue laboratory.

Internet of Things (11th 9 RMUT competition 2019) - RMUTSB

We got 2nd place in the RMUT group IoT competition in Thailand. We used an ESP20 to read sensor data and send it via MQTT to a Raspberry Pi server, which visualized the data on a web interface. I programmed the visualization and configured the ESP20 while my friend handled the hardware connections

The Robotic Design Contest (RDC 2018)

This program selects national representatives for the International Design Contest RoBoCon (IDC RoBoCon). All teamsare required to design a robot to solve a provided problem. It promotes equality with mixed teams, equal resources, and collaboration, including training sessions at all levels. My friend and I achieved the following:

News