Breaking Language Barriers: How Samsung's Galaxy AI Mastered Arabic Dialects
Samsung's Galaxy AI is now equipped to support 16 languages, making real-time and on-device translation accessible to more people. The development of Galaxy AI was a significant achievement for Samsung, and a series of visits to Samsung Research centers worldwide is shedding light on the challenges faced during its creation. In this installment, the focus is on the complexities of accounting for dialects.
Teaching an AI model a language is already a complex task, but when it involves a collection of diverse dialects, the challenge becomes even greater. The team at Samsung R&D Institute Jordan (SRJO) encountered this challenge when they added Arabic as a language option for Galaxy AI. They had to account for the various Arabic dialects found across the Middle East and North Africa, each with its own pronunciation, vocabulary, and grammar.

Arabic is spoken by over 400 million people worldwide, making it one of the top six most widely spoken languages. It is divided into two forms: Fus'ha (Modern Standard Arabic) and Ammiya (the dialects of Arabic). Fus'ha is used in public and official events, as well as in news broadcasts, while Ammiya is used in day-to-day conversations. With over 20 countries using Arabic and around 30 dialects in the region, the variation in the language posed a significant challenge for the team.
To address the variation presented by the dialects, the team at SRJO employed various techniques to discern and process the unique linguistic features of each dialect. This was crucial to ensure that Galaxy AI could understand and respond accurately, taking into account the regional nuances.
Mohammad Hamdan, the project leader of the Arabic language development team, explained that the pronunciation of objects in Arabic varies depending on the subject and verb in the sentence. The team's goal was to develop a model that could understand all the dialects and respond in standard Arabic.
The Text-to-Speech (TTS) team faced a unique challenge due to the nature of the Arabic language. Arabic uses diacritics, which guide the pronunciation of words in certain contexts but are absent in everyday writing. Converting raw text into phonemes, the basic units of sound, becomes difficult for a machine without diacritics. The team had to design a neural model that could predict and restore missing diacritics with high accuracy.
Neural models, similar to human brains, require extensive training and exposure to Arabic text to predict diacritics. The model needs to learn the language's rules and understand how words are used in different contexts. The team's efforts in training the Arabic TTS model were instrumental in enhancing its accuracy.
In order to enhance the Automatic Speech Recognition (ASR) process, the SRJO team had to collect diverse audio recordings of the dialects from various sources. These recordings were transcribed, focusing on unique sounds, words, and phrases. Native speakers of the dialects carefully converted the spoken words into text, improving the ASR process and enabling Galaxy AI's real-time understanding and response capabilities.
Building an ASR system that supports multiple dialects in a single model is a complex task that requires a thorough understanding of the language, careful data selection, and advanced modeling techniques. Mohammad Hamdan, the ASR lead for the project, emphasized the intricacies involved in the process.
After months of planning, building, and testing, the team successfully released Arabic as a language option for Galaxy AI. This achievement has made Galaxy AI services accessible to Arabic speakers, bridging the language and cultural barriers between them and people around the world. The team's success has established new best practices that can be implemented globally. They continue to refine their models and enhance the quality of Galaxy AI's language capabilities.
Arabic is just one of the languages and dialects newly supported by Galaxy AI, which can be downloaded from the Settings app on Galaxy devices running Samsung's One UI 6.1 update.