Weerayut Buaphet

Projects

• MultiLexNorm: Multilingual Lexical Normalization2: MultiLexNorm2 is a task to standardize non-canonical text into canonical form across 15 languages, focusing on non-Indo-European languages.

• Bi-lingual Named Entity Recognition for financial domain: (Ongoing project) Creating a financial NER dataset composed for two languages, Thai and English. This project aims to study the knowledge transfer from high-resource to low-resource languages in the financial domain.

• Few-shot Named Entity Recognition: (Under review) In this project, we focus on leveraging the capabilities of large language models to generate relevant examples, thereby enhancing the effectiveness of few-shot NER.

Publications

• Thai Nested Named Entity Recognition Corpus: (Aug 2019 – May 2022)
Weerayut Buaphet, Can Udomcharoenchaikit, Peerat Limkonchotiwat, Attapol Rutherford, and Sarana Nutanong. 2022. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1473–1486, Dublin, Ireland. Association for Computational Linguistics. Paper, GitHub