Research
2025
- SEA-HELM: Southeast Asian Holistic Evaluation of Language ModelsYosephine Susanto, Adithya Venkatadri Hulagadri, Jann Railey Montalan, and 7 more authorsIn Findings of the Association for Computational Linguistics: ACL 2025, Jul 2025
With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multiculturalbenchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specificcapabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA)region, a comprehensive and culturally representative evaluation suite for the SEA languages has not been developed thus far.Here, we present SEA-HELM, a holistic linguistic and cultural LLM evaluation suite that emphasises SEA languages, comprisingfive core pillars: (1) NLP CLASSICS, (2) LLM-SPECIFICS, (3) SEA LINGUISTICS, (4) SEA CULTURE, (5) SAFETY. SEA-HELMcurrently supports Filipino, Indonesian, Tamil, Thai, and Vietnamese. We also introduce the SEA-HELM leaderboard, which allows users to understand models’ multilingual and multicultural performance in a systematic and user-friendly manner. We make the SEA-HELM evaluation code publicly available.
@inproceedings{susanto-etal-2025-sea, title = {{SEA}-{HELM}: {S}outheast {A}sian Holistic Evaluation of Language Models}, author = {Susanto, Yosephine and Hulagadri, Adithya Venkatadri and Montalan, Jann Railey and Ngui, Jian Gang and Yong, Xianbin and Leong, Wei Qi and Rengarajan, Hamsawardhini and Limkonchotiwat, Peerat and Mai, Yifan and Tjhi, William Chandra}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-acl.636.pdf}, doi = {10.18653/v1/2025.findings-acl.636}, pages = {12308--12336}, isbn = {979-8-89176-256-5}, } - Batayan: A Filipino NLP benchmark for evaluating Large Language ModelsJann Railey Montalan, Jimson Paulo Layacan, David Demitri Africa, and 5 more authorsIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
Recent advances in large language models (LLMs) have demonstrated remarkable capabilities on widely benchmarked high-resource languages. However, linguistic nuances of under-resourced languages remain unexplored. We introduce Batayan, a holistic Filipino benchmark that systematically evaluates LLMs across three key natural language processing (NLP) competencies: understanding, reasoning, and generation. Batayan consolidates eight tasks, three of which have not existed prior for Filipino corpora, covering both Tagalog and code-switched Taglish utterances. Our rigorous, native-speaker-driven adaptation and validation processes ensures fluency and authenticity to the complex morphological and syntactic structures of Filipino, alleviating the pervasive translationese bias in existing Filipino corpora. We report empirical results on a variety of open-source and commercial LLMs, highlighting significant performance gaps that signal the under-representation of Filipino in pre-training corpora, the unique hurdles in modeling Filipino’s rich morphology and construction, and the importance of explicit Filipino language support. Moreover, we discuss the practical challenges encountered in dataset construction and propose principled solutions for building culturally and linguistically-faithful resources in under-represented languages. We also provide a public evaluation suite as a clear foundation for iterative, community-driven progress in Filipino NLP.
@inproceedings{montalan-etal-2025-batayan, title = {{B}atayan: {A} {F}ilipino {NLP} benchmark for evaluating {L}arge {L}anguage {M}odels}, author = {Montalan, Jann Railey and Layacan, Jimson Paulo and Africa, David Demitri and Flores, Richell Isaiah S. and Ii, Michael T. Lopez and Magsajo, Theresa Denise and Cayabyab, Anjanette and Tjhi, William Chandra}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.acl-long.1509.pdf}, doi = {10.18653/v1/2025.acl-long.1509}, pages = {31239--31273}, isbn = {979-8-89176-251-0}, }
2024
- Field MattersZero-shot Cross-lingual POS Tagging for FilipinoJimson Paulo Layacan, Isaiah Edri W. Flores, Katrina Bernice M. Tan, and 3 more authorsIn Proceedings of the Third Workshop on NLP Applications to Field Linguistics, Aug 2024
Supervised learning approaches in NLP, exemplified by POS tagging, rely heavily on the presence of large amounts of annotated data. However, acquiring such data often requires significant amount of resources and incurs high costs. In this work, we explore zero-shot cross-lingual transfer learning to address data scarcity issues in Filipino POS tagging, particularly focusing on optimizing source language selection. Our zero-shot approach demonstrates superior performance compared to previous studies, with top-performing fine-tuned PLMs achieving F1 scores as high as 79.10%. The analysis reveals moderate correlations between cross-lingual transfer performance and specific linguistic distances–featural, inventory, and syntactic–suggesting that source languages with these features closer to Filipino provide better results. We identify tokenizer optimization as a key challenge, as PLM tokenization sometimes fails to align with meaningful representations, thus hindering POS tagging performance.
@inproceedings{layacan-etal-2024-zero, title = {Zero-shot Cross-lingual {POS} Tagging for {F}ilipino}, author = {Layacan, Jimson Paulo and Flores, Isaiah Edri W. and Tan, Katrina Bernice M. and Estuar, Ma. Regina E. and Montalan, Jann Railey E. and De Leon, Marlene M.}, editor = {Serikov, Oleg and Voloshina, Ekaterina and Postnikova, Anna and Muradoglu, Saliha and Le Ferrand, Eric and Klyachko, Elena and Vylomova, Ekaterina and Shavrina, Tatiana and Tyers, Francis}, booktitle = {Proceedings of the Third Workshop on NLP Applications to Field Linguistics}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.fieldmatters-1.9/}, doi = {10.18653/v1/2024.fieldmatters-1.9}, pages = {69--77}, } - SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian LanguagesHoly Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, and 58 more authorsIn Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, through a collaborative movement, we introduce SEACrowd, a comprehensive resource center that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in Southeast Asia.
@inproceedings{lovenia-etal-2024-seacrowd, title = {{SEAC}rowd: A Multilingual Multimodal Data Hub and Benchmark Suite for {S}outheast {A}sian Languages}, author = {Lovenia, Holy and Mahendra, Rahmad and Akbar, Salsabil Maulana and Miranda, Lester James V. and Santoso, Jennifer and Aco, Elyanah and Fadhilah, Akhdan and Mansurov, Jonibek and Imperial, Joseph Marvin and Kampman, Onno P. and Moniz, Joel Ruben Antony and Habibi, Muhammad Ravi Shulthan and Hudi, Frederikus and Montalan, Railey and Ignatius, Ryan and Lopo, Joanito Agili and Nixon, William and Karlsson, B{\"o}rje F. and Jaya, James and Diandaru, Ryandito and Gao, Yuze and Amadeus, Patrick and Wang, Bin and Cruz, Jan Christian Blaise and Whitehouse, Chenxi and Parmonangan, Ivan Halim and Khelli, Maria and Zhang, Wenyu and Susanto, Lucky and Ryanda, Reynard Adha and Hermawan, Sonny Lazuardi and Velasco, Dan John and Kautsar, Muhammad Dehan Al and Hendria, Willy Fitra and Moslem, Yasmin and Flynn, Noah and Adilazuarda, Muhammad Farid and Li, Haochen and Lee, Johanes and Damanhuri, R. and Sun, Shuo and Qorib, Muhammad Reza and Djanibekov, Amirbek and Leong, Wei Qi and Do, Quyet V. and Muennighoff, Niklas and Pansuwan, Tanrada and Putra, Ilham Firdausi and Xu, Yan and Chia, Tai Ngee and Purwarianti, Ayu and Ruder, Sebastian and Tjhi, William and Limkonchotiwat, Peerat and Aji, Alham Fikri and Keh, Sedrick and Winata, Genta Indra and Zhang, Ruochen and Koto, Fajri and Yong, Zheng-Xin and Cahyawijaya, Samuel}, editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2024}, address = {Miami, Florida, USA}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.emnlp-main.296/}, doi = {10.18653/v1/2024.emnlp-main.296}, pages = {5155--5203}, } - PACLICKALAHI: A handcrafted, grassroots cultural LLM evaluation suite for FilipinoJann Railey Montalan, Jian Gang Ngui, Wei Qi Leong, and 4 more authorsIn Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation, Dec 2024
Multilingual large language models (LLMs) today may not necessarily provide culturally appropriate and relevant responses to its Filipino users. We introduce KALAHI, a cultural LLM evaluation suite that is part of SEA-HELM. It was collaboratively created by native Filipino speakers, and is composed of 150 high-quality, handcrafted and nuanced prompts that test LLMs for generations that are relevant to shared Filipino cultural knowledge and values. Strong LLM performance in KALAHI indicates a model’s ability to generate responses similar to what an average Filipino would say or do in a given situation. We conducted experiments on LLMs with multilingual and Filipino language support. Results show that KALAHI, while trivial for Filipinos, is challenging for LLMs, with the best model answering only 46.0% of the questions correctly compared to native Filipino performance of 89.10%. Thus, KALAHI can be used to accurately and reliably evaluate Filipino cultural representation in LLMs.
@inproceedings{montalan-etal-2024-kalahi, title = {{KALAHI}: {A} handcrafted, grassroots cultural {LLM} evaluation suite for {F}ilipino}, author = {Montalan, Jann Railey and Ngui, Jian Gang and Leong, Wei Qi and Susanto, Yosephine and Rengarajan, Hamsawardhini and Aji, Alham Fikri and Tjhi, William Chandra}, editor = {Oco, Nathaniel and Dita, Shirley N. and Borlongan, Ariane Macalinga and Kim, Jong-Bok}, booktitle = {Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation}, month = dec, year = {2024}, address = {Tokyo, Japan}, publisher = {Tokyo University of Foreign Studies}, url = {https://aclanthology.org/2024.paclic-1.49.pdf}, pages = {497--523}, }
2019
- ICSIMMeasles Metapopulation Modeling using Ideal Flow of Transportation NetworksJann Railey Montalan, Maria Regina Justina Estuar, Kardi Teknomo, and 1 more authorIn Proceedings of the 2nd International Conference on Software Engineering and Information Management, Bali, Indonesia, Dec 2019
In developing countries with limited access to medical resources, infectious diseases like measles can develop rapidly within and between communities. Combination of data coming from various sources that report historical disease incidences and transportation infrastructures are valuable sources of knowledge that can assist in public health policies and initiatives surrounding disease surveillance. This study integrates population, disease incidence, and transportation network data into measles modeling. Results show that a hybrid metapopulation modeling approach using ideal flow distribution over mobility networks can yield more accurate models for measles progression. This demonstrates the feasibility of using big data in the monitoring of measles propagation.
@inproceedings{10.1145/3305160.3305210, author = {Montalan, Jann Railey and Estuar, Maria Regina Justina and Teknomo, Kardi and Gardon, Roselle Wednesday}, title = {Measles Metapopulation Modeling using Ideal Flow of Transportation Networks}, year = {2019}, isbn = {9781450366427}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3305160.3305210}, doi = {10.1145/3305160.3305210}, booktitle = {Proceedings of the 2nd International Conference on Software Engineering and Information Management}, pages = {147–151}, numpages = {5}, keywords = {mobility and smart computing, big data applications in epidemiology, Computational modeling and data integration}, location = {Bali, Indonesia}, series = {ICSIM '19}, }