Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task

Matthew Theodore Roque, Carlos Rafael Catalan, Dan John Velasco, Manuel Antonio Rufino, Jan Christian Blaise Cruz

Published in Proceedings of the Ninth Conference on Machine Translation, 2024

This paper presents the methodology developed by the Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) for the WMT 2024 Shared Task on Low-Resource Indic Language Translation. We trained standard sequence-to-sequence Transformer models from scratch for both English-to-Indic and Indic-to-English translation directions. Additionally, we explored data augmentation through backtranslation and the application of noisy channel reranking to improve translation quality. A multilingual model trained across all language pairs was also investigated. Our results demonstrate the effectiveness of the multilingual model, with significant performance improvements observed in most language pairs, highlighting the potential of shared language representations in low-resource translation scenarios.