​Neural machine translation system enables zero-shot translation

Google translate logo
New approach to multilingual NMT at Google Translate

Up to now neural machine translation (NMT) systems have been built for specific language pairs. This latest contribution from the Google Translate and Google Brain teams shows how they have achieved comparable results using a multilingual NMT model. A multilingual model requires less processing power and training data than would be needed if a separate model for every language pair had to be produced, so it has obvious advantages if it’s found to work well.

The researchers obtain reasonable quality with this approach, including when they translate between a language pair for which the system has seen no parallel training data. That is what they call zero-shot translation. For example, the multilingual system includes Spanish, Portuguese and English, but the system has had little or no training on the Spanish-Portuguese pair. Previously it would have been necessary to perform relay translation – Spanish to English, then English to Portuguese – in order to do Spanish to Portuguese translation.  Yet, results show that Spanish-Portuguese translation (or other combinations among the languages available) is potentially feasible with the multilingual set-up, and the quality can then be improved further by using relatively little additional data. These favourable results for zero-shot translation seem to offer scope for future development of NMT for languages and language combinations for which we might not have much training data.

Other capabilities tested and discussed include the translation of code-switching, i.e. translation of source texts that mix languages, and the production of code-switched target texts, partly in one language and partly in another. While the first scenario seems a useful one to test, it is less clear how the second scenario corresponds to any translation needs. However, it is fascinating to see some discussion about how this multilingual NMT may be using a kind of interlingual representation. We can expect future research to shed more light on this aspect of machine learning in NMT systems.


Our approach has been shown to work reliably in a Google-scale production setting and enables us to scale to a large number of languages quickly.

Johnson et al. (2016)

We propose a simple, elegant solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the beginning of the input sentence to specify the required target language. The rest of the model, which includes encoder, decoder and attention, remains unchanged and is shared across all languages. Using a shared wordpiece vocabulary, our approach enables Multilingual NMT using a single model without any increase in parameters, which is significantly simpler than previous proposals for Multilingual NMT. Our method often improves the translation quality of all involved language pairs, even while keeping the total number of model parameters constant. On the WMT’14 benchmarks, a single multilingual model achieves comparable performance for English→French and surpasses state-of-the-art results for English→German. Similarly, a single multilingual model surpasses state-of-the-art results for French→English and German→English on WMT’14 and WMT’15 benchmarks respectively. On production corpora, multilingual models of up to twelve language pairs allow for better translation of many individual pairs. In addition to improving the translation quality of language pairs that the model was trained with, our models can also learn to perform implicit bridging between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Finally, we show analyses that hints at a universal interlingua representation in our models and show some interesting examples when mixing languages.

To find out more

Johnson, Melvin, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes and  Jeffrey Dean (2016) ​’Neural Machine Translation System: Enabling Zero-Shot Translation‘, arXiv [Computation and Language].

Schuster, Mike, Melvin Johnson and Nikhil Thorat (2016) Zero-Shot Translation with Google’s Multilingual Neural Machine Translation System, Google Research Blog, 22 November 2016.

Leave a Reply

Your email address will not be published. Required fields are marked *