What is the Facebook model for translation?

Facebook is one of the largest social media platforms in the world, with billions of users across countries and languages. To enable communication between such a diverse userbase, Facebook has developed its own models for translating content on its platform.

The challenges of translation at Facebook’s scale

With over 2.9 billion monthly active users as of Q3 2022, Facebook faces immense challenges in translating content across languages. Some key difficulties include:

Scale – The amount of content generated on Facebook daily is massive, comprising posts, comments, ads and more. Translating all this content is a monumental task.

Language diversity – Facebook supports over 100 languages, including many low-resource languages that lack extensive training data.
Informal language – User-generated content contains a high amount of slang, regional dialects and misspellings, making it difficult to translate.
Context – Translating text often requires understanding nuanced cultural contexts that are hard to convey in another language.

To overcome these challenges, Facebook AI researchers have developed specialized machine translation models tailored for social media content.

Facebook’s machine translation models

At the core of Facebook’s translation capabilities are customized neural machine translation (NMT) models. NMT uses deep learning to translate text, giving significantly better results than older phrase-based statistical machine translation (SMT).

Facebook uses NMT models which are trained on massive datasets gathered from its social media platforms. This social media-specific training data contains the informal language, abbreviations, slang and misspellings found on social platforms. Training on this data produces models that can handle user-generated content much better.

Some key NMT models developed by Facebook are:

WMT19 news translation model – Trained on filtered news and online text corpora, this model can translate formal content between 100 languages.
BUCC shared multilingual encoder – Using a shared vocabulary for multiple languages improves translation quality, especially for low-resource languages. This model supports 50 languages currently.

LASER (Language-Agnostic SEntence Representations) – Produces language-independent sentence representations, enabling zero-shot translation between languages not seen during training.

Other innovations

Besides large-scale NMT models, Facebook AI has developed other innovations to enhance translation:

Glyph embedding – Encoding characters/glyphs instead of words, helping translate rare words or those with spelling mistakes.

Context-aware translation – Taking surrounding context into account improves disambiguation of words.
Multilingual chatbots – bots like Blender can chat in multiple languages by translating on-the-fly.

Facebook’s human-in-the-loop approach

While NMT models produce high-quality translations, some errors and mistranslations are inevitable. Facebook employs a human-in-the-loop approach to catch these mistakes and continue improving the models.

Thousands of human translators and reviewers are involved in refining Facebook’s translations. Some ways humans augment the models include:

Editing machine translations to correct errors.
Providing feedback through rating interfaces.

Translating content that models are uncertain about.
Identifying areas for improvement to train updated models.

This human feedback is critical for enhancing Facebook’s models and post-editing their translations. It creates a positive feedback loop enabling the models to learn from their mistakes.

Results achieved by Facebook’s translation systems

Facebook’s massive datasets, computational resources and human-machine hybrid approach has led to impressive improvements in translation quality:

English-Spanish translation attained human parity in 2019 – the first time for any language pair.
The BUCC model reduced translation errors by 10-20% across many languages compared to previous best models.

Multilingual NMT reduces errors by 33% when translating into low-resource languages.
Facebook’s models have enabled the introduction of tens of thousands of language pairs that previously couldn’t be translated by machines.

According to Facebook, its NMT systems have reduced mistranslations by an average of 60% compared to phrase-based SMT models. The impact is most pronounced for low resource languages, where translation quality has improved by leaps and bounds.

Usage of translations across Facebook products

The translation systems developed by Facebook AI researchers underpin translations across Facebook’s family of products:

Facebook app and desktop site – Users can view translated posts, comments, Page information and notifications.
Instagram – Captions, comments and bio info can be translated.

Messenger – One-on-one and group conversations are automatically translated.
Workplace from Facebook – Enables multilingual collaboration and communication.
Portal – Smart display supports messaging and calls in multiple languages.

Facebook also provides access to its translation technologies and multilingual models via its AI platform. Developers can leverage these models to build services that can understand different languages.

Future directions for Facebook translation

Some ongoing areas of research for Facebook’s translation teams are:

Improving low resource language translation using techniques like transfer learning.

Handling non-standard words like emoji, tagged text and speech effects.
Increasing context awareness for disambiguation and improved coherence.
Enabling real-time voice translation for videos, apps and other modalities.

Going forward, Facebook aims to continue improving translation quality and extending support to more languages. The ultimate goal is to enable real-time communication for all Facebook’s users across different languages and cultures.

Conclusion

Facebook has made remarkable progress in machine translation through a combination of customized NMT models, human feedback loops and advances like multilingual training. While challenges remain, they have already achieved human parity for some language pairs and greatly expanded translation capabilities for low resource languages.

The fruits of Facebook’s AI research are manifest in the seamless cross-lingual experience enabled across its apps used by billions worldwide. Looking ahead, they aim to make even more strides in overcoming language barriers through technology.