Neural network translator faster closer more accurate. The neural network has captured the Yandex translator. Machine translation: what are the challenges?

Yandex has launched a new version of the translator. A hybrid system will now work on translation: in addition to the statistical model used previously, the translator will also use a neural network. This was reported on the company's blog.

There are several approaches to machine translation. The first, most common approach is statistical. Such machine translation is based on memorizing a huge amount of information obtained from parallel corpora (identical texts in different languages): these can be either individual words or grammatical rules. This approach, however, has a very important drawback: statistical machine translation remembers information, but does not understand it, so such a translation often looks like many different correctly translated pieces collected into one text that is not very correct in terms of grammar and semantic load.

The second approach is neural network. It is based not on the translation of individual words and phrases, but entire sentences, and its main goal is to preserve the meaning while achieving best quality translation from a grammatical point of view. This translation technology can also preserve the knowledge about the language that it acquired during the learning process - this allows it to cope, for example, with errors in case agreement. Neural machine translation - comparatively new approach However, he has already proven himself: with the help of the neural network, Google Translate was able to achieve record-breaking translation quality.

Starting today, Yandex.Translator works based on a hybrid system. Such a system includes the statistical translation used by the service previously, and translation based on the work of a neural network. A special classifier algorithm working on the basis of CatBoost (a machine learning system developed by Yandex) selects the best one from two translation options (statistical and neural) and gives it to the user.

More details about the work new version You can read Yandex.Translator in ours with the head of the service - British computer linguist David Talbot.

Now new technology translation is available only when translating from English into Russian (according to the company, this is the most popular direction of translation). While working with the system, the user can switch between two translation models (old statistical and new hybrid) and compare the translation of the old and new versions. In the coming months, the Translator developers promise to include other areas of translation.


Examples of translation of different models used in the new version of Yandex.Translator

Yandex.Translator has learned to make friends with the neural network and provide users with higher quality texts. Yandex began to use a hybrid translation system: initially it worked statistically, and now it is complemented by CatBoost machine learning technology. True, there is one thing. So far only for translation from English into Russian.

Yandex claims that this is the most popular direction of translations, accounting for 80% of the total.

CatBoost is a smart thing that, having received two translation versions, compares them, choosing the most human-like one.

In the statistical version, the translation is usually broken down into individual phrases and words. Neuroness does not do this; I analyze the sentence as a whole, taking into account, if possible, the context. Hence, it is very similar to human translation, because the neural network can take into account word agreements. However, the statistical approach also has its advantages when it does not fantasize if it sees a rare or unclear word. the neural network may attempt to be creative.

After today's announcement, the number should be reduced grammatical errors in automatic translations. Now they go through the language model. Now you shouldn’t come across moments like “daddy’s gone” or “severe pain.”

In the web version in this moment users can choose the translation version that seems most correct and successful to them; there is a separate trigger for this.

If you are as interested in news from the IT world as we are, subscribe to our Telegram channel. All materials appear there as quickly as possible. Or maybe it's more convenient for you? We are even in .

Did you like the article?

Or at least leave a happy comment so that we know which topics are most interesting to readers. Besides, it inspires us. The comment form is below.

What's wrong with her? You can express your indignation at [email protected]. We will try to take into account your wishes in the future to improve the quality of the site materials. Now let's spend educational work with the author.

09.14.2017, Thu, 14:19, Moscow time , Text: Valeria Shmyrova

In the Yandex.Translator service, in addition to statistical translation, the option of translation from a neural network has become available. Its advantage is that it works with entire sentences, takes better into account context and produces consistent, natural text. However, when a neural network does not understand something, it begins to fantasize.

Launching a neural network

The Yandex.Translator service has launched a neural network that will help improve the quality of translation. Previously, translation from one language to another was carried out using a statistical mechanism. Now the process will be hybrid: both the statistical model and the neural network will offer their own version of translation. After this, the CatBoost algorithm, which is based on machine learning, will select the best result obtained.

So far, the neural network only performs translation from English into Russian and only in the web version of the service. According to the company, in Yandex.Translator requests for English-Russian translation make up 80% of all requests. In the coming months, the developers intend to introduce the hybrid model in other areas. To allow the user to compare translations from different mechanisms, a special switch is provided.

Differences from statistical translator

The operating principle of a neural network differs from the statistical translation model. Instead of translating text word by word, expression by expression, it works with entire sentences without breaking them into parts. Thanks to this, the translation takes into account the context and better conveys the meaning. In addition, the translated sentence is consistent, natural, easy to read and understand. According to the developers, it can be mistaken for the work of a human translator.

Neural network translation resembles human translation

The peculiarities of the neural network include the tendency to “fantasize” when something is not clear to it. In this way she tries to guess the correct translation.

A statistical translator has its advantages: he more successfully translates rare words and expressions - less common names, toponyms, etc. In addition, he does not fantasize if the meaning of a sentence is not clear. According to the developers, the statistical model copes better with short phrases.

Other mechanisms

Yandex.Translator has a special mechanism that refines the translation of a neural network, just like the translation of a statistical translator, correcting mismatched combinations of words and spelling errors. Thanks to this, the user will not see combinations like “dad went” or “severe pain” in the translation, the developers assure. This effect is achieved by comparing the translation with the language model - all the knowledge about the language accumulated by the system.

In difficult cases, the neural network tends to fantasize

A language model contains a list of words and expressions in a language, as well as data on the frequency of their use. It has found application outside of Yandex.Translator. For example, when using Yandex.Keyboard, it is she who guesses what word the user wants to type next and offers him ready-made options. For example, the language model understands that “hello, how” is likely to be followed by variants of “doing” or “you.”

What is “Yandex.Translator”

“Yandex.Translator is a service for translating texts from one language into another from the Yandex company, which began work in 2011. Initially, it worked only with Russian, Ukrainian and English language.

During the existence of the service, the number of languages ​​has increased to 94 languages. Among them there are also exotic ones, such as braid or papiamento. Translation can be done between any two languages.

In 2016, Yandex.Translator added a fictional and artificially created language used by elves in the books of J. R. R. Tolkien.

or Does quantity develop into quality?

Article based on a speech at the RIF+KIB 2017 conference.

Neural Machine Translation: why only now?

They have been talking about neural networks for a long time, and it would seem that one of the classical problems artificial intelligence– machine translation – simply begs to be solved on the basis of this technology.

Nevertheless, here is the dynamics of popularity in searches for queries about neural networks in general and about neural machine translation in particular:

It is clearly visible that until recently there was nothing on the radar about neural machine translation – and at the end of 2016, several companies demonstrated their new technologies and machine translation systems based on neural networks, including Google, Microsoft and SYSTRAN. They appeared almost simultaneously, several weeks or even days apart. Why is that?

In order to answer this question, it is necessary to understand what machine translation based on neural networks is and what is its key difference from the classical statistical systems or analytical systems that are used today for machine translation.

The neural translator is based on a mechanism of bidirectional recurrent neural networks (Bidirectional Recurrent Neural Networks), built on matrix calculations, which allows you to build significantly more complex probabilistic models than statistical machine translators.


Like statistical translation, neural translation requires parallel corpuses for training, which make it possible to compare automatic translation with the reference “human” one; only in the learning process it operates not with individual phrases and word combinations, but with entire sentences. The main problem is that training such a system requires significantly more computing power.

To speed up the process, developers use GPUs from NVIDIA, as well as Google’s Tensor Processing Unit (TPU), proprietary chips adapted specifically for machine learning technologies. Graphics chips are initially optimized for matrix calculation algorithms, and therefore the performance gain is 7-15 times compared to the CPU.

Even so, training a single neural model takes 1 to 3 weeks, while a statistical model of roughly the same size takes 1 to 3 days to train, and this difference increases as the size increases.

However, it was not only technological problems that hindered the development of neural networks in the context of the machine translation task. In the end, it was possible to train language models earlier, albeit more slowly, but there were no fundamental obstacles.

The fashion for neural networks also played a role. Many people were developing internally, but they were in no hurry to announce it, fearing, perhaps, that they would not receive the increase in quality that society expects from the phrase Neural Networks. This may explain the fact that several neural translators were announced one after another.

Translation quality: whose BLEU score is thicker?

Let's try to understand whether the increase in translation quality corresponds to accumulated expectations and the increase in costs that accompany the development and support of neural networks for translation.
Google in its research demonstrates that neural machine translation gives Relative Improvement from 58% to 87%, depending on the language pair, compared to the classical statistical approach (or Phrase Based Machine Translation, PBMT, as it is also called).


SYSTRAN conducts a study in which the quality of translation is assessed by selecting from several presented options made various systems, as well as “human” translation. And he states that his neural translation is preferred in 46% of cases to human translation.

Translation quality: is there a breakthrough?

Even though Google claims an improvement of 60% or more, there is a slight catch to this figure. Representatives of the company talk about “Relative Improvement”, that is, how close they managed with the neural approach to the quality of Human Translation in relation to what was in the classic statistical translator.


Industry experts analyzing the results presented by Google in the article “Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation” are quite skeptical about the presented results and say that in fact the BLEU score was only improved by 10%, and significant progress is noticeable precisely when simple tests from Wikipedia, which were most likely used in the network training process.

Inside PROMT, we regularly compare translations on various texts of our systems with competitors, and therefore we always have examples at hand on which we can check whether neural translation is really as superior to the previous generation as the manufacturers claim.

Original text (EN): Worrying never did anyone any good.
Google Translation PBMT: Didn't do anything good to anyone without worrying.
Google Translation NMT: Worry has never helped anyone.

By the way, the translation of the same phrase on Translate.Ru: “Worry has never brought anyone any benefit,” you can see that it was and remains the same without the use of neural networks.

Microsoft Translator is also not lagging behind in this matter. Unlike their colleagues from Google, they even made a website where you can translate and compare two results: neural and pre-neural, to make sure that statements about growth in quality are not unfounded.


In this example, we see that there is progress, and it is really noticeable. At first glance, it seems that the developers’ statement that machine translation has almost caught up with human translation is true. But is this really so, and what does this mean from the point of view practical application technology for business?

In general, translation using neural networks is superior to statistical translation, and this technology has enormous potential for development. But if we look at the issue carefully, we can see that progress is not in everything, and not all tasks can be applied to neural networks without regard to the task itself.

Machine translation: what are the challenges?

From the automatic translator the entire history of its existence - and this is already more than 60 years! – they were expecting some kind of magic, imagining it as a machine from science fiction films that instantly transforms any speech into an alien whistle and back.

In fact, there are tasks different levels, one of which implies a “universal” or, so to speak, “everyday” translation for everyday tasks and ease of understanding. Online translation services and many mobile products cope well with tasks at this level.

Such tasks include:

Quick translation of words and short texts for various purposes;
automatic translation during communication on forums, in in social networks, messengers;
automatic translation when reading news, Wikipedia articles;
travel translator (mobile).

All those examples of increasing the quality of translation using neural networks that we discussed above relate precisely to these tasks.

However, when it comes to business goals and objectives regarding machine translation, things are a little different. Here, for example, are some of the requirements for corporate machine translation systems:

Translation of business correspondence with clients, partners, investors, foreign employees;
localization of websites, online stores, product descriptions, instructions;
translation of user content (reviews, forums, blogs);
the ability to integrate translation into business processes and software products and services;
accuracy of translation in compliance with terminology, confidentiality and security.

Let's try to understand, using examples, whether any translation business problems can be solved using neural networks and how exactly.

Case: Amadeus

Amadeus is one of the world's largest global airline ticket distribution systems. On the one hand, air carriers are connected to it, on the other, agencies that must receive all information about changes in real time and convey it to their clients.

The task is to localize the conditions for applying tariffs (Fare Rules), generated in the reservation system automatically from different sources. These rules are always written in English. Manual translation is practically impossible here, due to the fact that there is a lot of information and it changes often. An airline ticket agent would like to read the Fare Rules in Russian in order to promptly and competently advise their clients.

A clear translation is required that conveys the meaning of the tariff rules, taking into account typical terms and abbreviations. And it requires automatic translation to be integrated directly into the Amadeus booking system.

→ The task and implementation of the project are described in detail in the document.

Let's try to compare the translation made through the PROMT Cloud API, integrated into the Amadeus Fare Rules Translator, and the “neural” translation from Google.

Original: ROUND TRIP INSTANT PURCHASE FARES

PROMT (Analytical approach): RATES FOR INSTANT PURCHASE OF A ROUND FLIGHT

GNMT: ROUND PURCHASES

It is obvious that the neural translator cannot cope here, and a little further it will become clear why.

Case: TripAdvisor

TripAdvisor is one of the world's largest travel services that needs no introduction. According to an article published by The Telegraph, 165,600 new reviews of various tourist sites in different languages ​​appear on the site every day.

The task is to translate tourist reviews from English into Russian with a translation quality sufficient to understand the meaning of this review. The main difficulty: typical features of user generated content (texts with errors, typos, missing words).

Also part of the task was to automatically assess the quality of the translation before publication on the TripAdvisor website. Since manually assessing all translated content is not possible, a machine translation solution must provide an automatic confidence score to ensure TripAdvisor only publishes high-quality translated reviews.

For the solution, PROMT DeepHybrid technology was used, which makes it possible to obtain a higher quality translation that is understandable to the end reader, including through statistical post-editing of the translation results.

Let's look at examples:

Original: We ate there last night on a whim and it was a lovely meal. The service was attentive without being over bearing.

PROMT (Hybrid translation): We ate there last night on a whim and it was a wonderful meal. The staff were attentive without being overbearing.

GNMT: We ate there last night on a whim and it was a wonderful meal. The service was attentive without being overbearing.

Here everything is not as depressing in terms of quality as in the previous example. And in general, in terms of its parameters, this problem can potentially be solved using neural networks, and this can further improve the quality of translation.

Challenges of using NMT for business

As mentioned earlier, a “universal” translator does not always provide acceptable quality and cannot support specific terminology. To integrate and use neural networks for translation into your processes, you need to meet the basic requirements:

The presence of sufficient volumes of parallel texts in order to be able to train a neural network. Often the customer simply has few of them or no texts on this topic exist in nature. They may be classified or in a state not very suitable for automatic processing.

To create a model, you need a database that contains at least 100 million tokens (word usages), and to get a translation of more or less acceptable quality - 500 million tokens. Not every company has such a volume of materials.

Availability of a mechanism or algorithms for automatically assessing the quality of the result obtained.

Sufficient computing power.
A “universal” neural translator is most often not suitable in quality, and in order to deploy your own private neural network capable of providing acceptable quality and speed of work, a “small cloud” is required.

It's not clear what to do with privacy.
Not every customer is ready to give their content for translation to the cloud for security reasons, and NMT is a cloud-first story.

conclusions

In general, neural automatic translation produces results of higher quality than a “purely” statistical approach;
Automatic translation through a neural network is better suited for solving the problem of “universal translation”;
None of the approaches to MT by itself is an ideal universal tool for solving any translation problem;
To solve business translation problems, only specialized solutions can guarantee compliance with all requirements.

We come to the absolutely obvious and logical decision that for your translation tasks you need to use the translator that is most suitable for this. It doesn't matter whether there is a neural network inside or not. Understanding the task itself is more important.

Tags: Add tags

There are more than 630 million sites on the modern Internet, but only 6% of them contain Russian-language content. The language barrierthe main problem dissemination of knowledge between network users, and we believe that this needs to be solved not only by teaching foreign languages, but also by using automatic machine translation in the browser.

Today we will tell Habr readers about two important technological changes in the Yandex Browser translator. First, the translation of highlighted words and phrases now uses a hybrid model, and we will remind you how this approach differs from using purely neural networks. Secondly, the translator’s neural networks now take into account the structure of web pages, the features of which we will also talk about below the cut.

Hybrid translator of words and phrases

The first machine translation systems were based on dictionaries and rules(essentially hand-written regular characters), which determined the quality of the translation. Professional linguists have worked for years to develop increasingly detailed manual rules. This work was so time-consuming that serious attention was paid only to the most popular pairs of languages, but even within these the machines did a poor job. Living language is very a complex system, which does not obey the rules well. It is even more difficult to describe the rules of correspondence between two languages.

The only way for a machine to constantly adapt to changing conditions is to learn on its own from a large number of parallel texts (the same in meaning, but written in different languages). This is the statistical approach to machine translation. The computer compares parallel texts and independently identifies patterns.

U statistical translator there are both advantages and disadvantages. On the one hand, he remembers rare and Difficult words and phrases. If they were found in parallel texts, the translator will remember them and will continue to translate correctly. On the other hand, the result of a translation can be like a completed puzzle: the overall picture seems clear, but if you look closely, you can see that it is made up of separate pieces. The reason is that the translator represents individual words as identifiers, which in no way reflect the relationship between them. This is inconsistent with the way people experience language, where words are defined by how they are used, how they relate to and differ from other words.

Helps solve this problem neural networks. Word embedding, used in neural machine translation, typically associates each word with a vector of several hundred numbers in length. Vectors, unlike simple identifiers from the statistical approach, are formed when training a neural network and take into account the relationships between words. For example, the model might recognize that since “tea” and “coffee” often appear in similar contexts, both of these words should be possible in the context of the new word “spill,” which, say, only one of them appeared in the training data.

However, the process of learning vector representations is clearly more statistically demanding than rote memorization of examples. In addition, it is not clear what to do with those rare input words that did not occur often enough for the network to build an acceptable vector representation for them. In this situation, it is logical to combine both methods.

Since last year, Yandex.Translator has been using hybrid model. When the Translator receives a text from a user, he gives it to both systems for translation - the neural network and the statistical translator. An algorithm, based on a learning method, then evaluates which translation is better. When grading, dozens of factors are taken into account - from the length of the sentence ( short phrases better translates the statistical model) to syntax. The translation recognized as the best is shown to the user.

It is the hybrid model that is now used in Yandex.Browser, when the user selects specific words and phrases on the page for translation.

This mode is especially convenient for those who generally own foreign language and would like to translate only unknown words. But if, for example, instead of the usual English you come across Chinese, then it will be difficult to do without a page-by-page translator. It would seem that the difference is only in the volume of the translated text, but not everything is so simple.

Neural network translator of web pages

From the time of the Georgetown experiment to almost the present day, all machine translation systems have been trained to translate each sentence of the source text individually. While a web page is not just a set of sentences, but structured text that contains fundamentally different elements. Let's look at the basic elements of most pages.

Heading. Usually bright and large text that we see immediately when entering the page. The headline often contains the essence of the news, so it is important to translate it correctly. But this is difficult to do, because there is not enough text in the title and without understanding the context you can make a mistake. In the case of English, it's even more complicated because English-language titles often contain phrases with unconventional grammar, infinitives, or even missing verbs. For example, Game of Thrones prequel announced.

Navigation. Words and phrases that help us navigate the site. For example, Home, Back And My account It’s hardly worth translating as “Home”, “Back” and “My Account” if they are located in the site menu and not in the text of the publication.

Main text. Everything is simpler with it; it differs little from ordinary texts and sentences that we can find in books. But even here, it is important to ensure translation consistency, that is, to ensure that within the same web page the same terms and concepts are translated in the same way.

For high-quality translation of web pages, it is not enough to use a neural network or hybrid model - it is also necessary to take into account the structure of the pages. And to do this we had to deal with many technological difficulties.

Classification of text segments. To do this, we again use CatBoost and factors based both on the text itself and on the HTML markup of documents (tag, text size, number of links per text unit, ...). The factors are quite heterogeneous, which is why CatBoost (based on gradient boosting) shows the best results (classification accuracy above 95%). But classifying segments alone is not enough.

Skewed data. Traditionally, Yandex.Translator algorithms are trained on texts from the Internet. It would seem that this is an ideal solution for training a web page translator (in other words, the network learns from texts of the same nature as the texts on which we are going to use it). But once we learned to separate the different segments from each other, we discovered interesting feature. On average, on websites, content takes up approximately 85% of all text, with headings and navigation accounting for only 7.5%. Remember also that the headings and navigation elements themselves are noticeably different in style and grammar from the rest of the text. These two factors together lead to the problem of data skew. It is more profitable for a neural network to simply ignore the features of these segments, which are very poorly represented in the training set. The network learns to translate only the main text well, which is why the quality of translation of headings and navigation suffers. To neutralize this unpleasant effect, we did two things: to each pair of parallel sentences we assigned one of three types of segments (content, heading or navigation) as meta information and artificially increased the concentration of the last two in the training corpus to 33% due to the fact that began to show similar examples to the learning neural network more often.

Multi-task learning. Since we can now divide text on web pages into three classes of segments, it might seem like a natural idea to train three separate models, each of which would handle the translation of a different type of text—headings, navigation, or content. This really works well, but the scheme works even better in which we train one neural network to translate all types of texts at once. The key to understanding lies in the idea of ​​mutli-task learning (MTL): if there is an internal connection between several machine learning tasks, then a model that learns to solve these tasks simultaneously can learn to solve each of the tasks better than a narrowly specialized model!

Fine-tuning. We already had a pretty good machine translation, so it would be unwise to train a new translator for Yandex.Browser from scratch. It's more logical to take basic system to translate regular texts and further train it to work with web pages. In the context of neural networks, this is often called fine-tuning. But if you approach this problem head-on, i.e. Simply initialize the weights of the neural network with values ​​from the finished model and start learning on new data, then you may encounter the effect of domain shift: as training progresses, the quality of translation of web pages (in-domain) will increase, but the quality of translation of regular (out-of-domain) ) texts will fall. To get rid of this unpleasant feature, during additional training we impose an additional restriction on the neural network, prohibiting it from changing the weights too much compared to the initial state.

Mathematically, this is expressed by adding a term to the loss function, which is the Kullback-Leibler distance (KL-divergence) between the probability distributions of generating the next word, issued by the original and additionally trained networks. As can be seen in the illustration, this leads to the fact that the increase in the quality of translation of web pages no longer leads to degradation of the translation of ordinary text.

Polishing frequency phrases from navigation. While working on a new translator, we collected statistics on the texts of various segments of web pages and saw something interesting. The texts that relate to navigation elements are quite highly standardized, so they often consist of the same template phrases. This is such a powerful effect that more than half of all navigation phrases found on the Internet are accounted for by just 2 thousand of the most frequent ones.

We, of course, took advantage of this and gave several thousand of the most common phrases and their translations to our translators for verification in order to be absolutely sure of their quality.

External alignments. There was another important requirement for the web page translator in the Browser - it should not distort the markup. When HTML tags are placed outside or on sentence boundaries, no problems arise. But if inside the sentence there is, for example, two underlined words, then in translation we want to see “two underlined words". Those. As a result of the transfer, two conditions must be met:

  1. The underlined fragment in the translation must correspond exactly to the underlined fragment in the source text.
  2. The consistency of the translation at the boundaries of the underlined fragment should not be violated.
To achieve this behavior, we first translate the text as usual, and then use statistical word-by-word alignment models to determine matches between fragments of the source and translated texts. This helps to understand what exactly needs to be emphasized (in italics, formatted as a hyperlink, ...).

Intersection observer. The powerful neural network translation models that we have trained require significantly more computing resources on our servers (both CPU and GPU) than previous generations of statistical models. At the same time, users do not always read pages to the end, so sending all the text of web pages to the cloud seems unnecessary. To save server resources and user traffic, we taught Translator to use