Back to All Events

Fact Checking in Low-Resource Languages: A New Dataset and Transformer Model for the Burmese Language

  • University of Michigan - 110 Weiser Hall 500 Church Street Ann Arbor, MI, 48109 United States (map)

Organizer: Center for Southeast Asian Studies at the University of Michigan

Type/Location: Hybrid / Ann Arbor, MI

Description:

Misinformation on Burmese social media is a serious problem, fueling hate speech and violence, especially during the 2017 Rohingya genocide. Despite efforts by platforms like Facebook to restrain harmful content using Burmese-speaking moderators and some automatic tools, a limited number of moderators working for these platforms are often overwhelmed by the amount of content to be fact checked. The goal of this research is to leverage AI and machine learning to create automatic fact checking tools to assist human moderators. The challenge we encountered is the lack of training data and effective machine learning models. We addressed this challenge by creating a large dataset and natural language processing (NLP) models for fact checking in Burmese. We translated the Fake News Challenge (FNC-1) dataset (originally in English) into Burmese using machine translation. We then trained and evaluated three BERT-based classifiers for fact checking in Burmese using the machine-translated dataset. We also evaluated the three classifiers using a manually annotated Burmese dataset for a comparison with machine-translated data. The top-performing model achieves high predictive performance on both machine-translated and manually annotated data, with an accuracy comparable to that of human fact-checkers. Our results show that BERT-based models trained specifically for Burmese perform better than those trained with multi-lingual data (i.e., general multilingual models). This research presents a crucial first step toward creating datasets and tools for fact-checking in Burmese and other low resource languages to combat misinformation online.

About the Speaker:

Lwin Moe is currently a Ph.D. candidate in Computer Science at York University’s Lassonde School of Engineering. As part of his Ph.D. dissertation, he studies fact-checking and misinformation detection using machine learning in general, and natural language processing (NLP) in particular.

Registration:

To attend the event in person, please register here.

To attend the event online, please register here.

 
Previous
Previous
April 17

“Very strong but also extremely fair”: Masculinity and Football in the Dutch East Indies, 1870-1942

Next
Next
April 18

Enchanted Modernities: Ancestral Vitalizations in the Upper Mekong