Best Practices in Assessing the Potential of Machine Translation
Today unlocking the potential of Machine Translation (MT) is a top priority for a variety of functions in a wide range of organizations around the globe. For product and marketing units, MT can accelerate entry to international markets and reach a wider international audience without a linear increase in costs - contributing to a faster ROI on their translation investment.
For translation and localization functions, MT, as it rapidly matures as a technology, promises to deliver productivity increases enabling you to do more for less.
Our vision is to apply MT solutions to the following business needs of our clients:
- Translate content that otherwise would not be translated due to cost or time constraints
- Reduce time-to-market
- Achieve productivity improvements
- Reduce total costs of translation
- Define best practices and use MT when applicable
The Untapped Potential
A frequent reason why publishers look to Machine Translation is to meet their objective of increasing the reach of high value content to global customers. In many situations, the traditional human translation-only approach, even when coupled with translation memory, will simply not deliver. Costs would be too high, time required too long. If your responsibility includes supporting your customers in their local languages effectively, Machine Translation may be the answer. For many leading global companies, MT has proven to deliver a very attractive ROI.
Moravia provides its clients with two key Machine Translation solutions that address a broad range of possible production applications of MT today: an array of linguistic MT post-editing services, and a comprehensive MT deployment solution.To deliver this, we have developed a flexible partnering strategy.
Moravia's MT Partnering Strategy
Moravia maintains a neutral stance among the evolving landscape of MT engine providers, allowing us to choose and recommend the best engine for your needs. We partner with a variety of providers. However, we do have a bias towards Statistical Machine Translation (SMT).
Despite the long history and excellent performance of many Rule-Based Machine Translation (RBMT) engines, we have made a strategic technology judgment that SMT provides the most promise for rapid improvement in translation quality - reducing post-editing costs and thus the costs to you - as well as the ability to train the engine for specific domains (subject areas) and even specific clients and product areas. Therefore, our partnerships are primarily with SMT vendors.
Moravia boasts the most mature post-editing solution in the industry. Our entire worldwide partner network has been adapted to the requirements of post-editing. We have established post-editing training courses, guidelines, and processes. Our post-editing solution can be applied either to output which we create based on engines we train and run, or output from the client's existing MT solution applied in-house.
MT Deployment Solution
Our MT deployment solution typically begins with a pilot. We have developed this methodology due to the current maturity level of MT and the variety of situational aspects governing how it can be deployed successfully. Therefore, a typical first step in assessing MT's potential for a specific customer is conducting a pilot project.
Always carried out jointly with our clients, the objective of a pilot project is to assess whether and how MT can be deployed in real life production; how this meets the specific expectations in terms of quality, cost and productivity; and what the best production process should be.
The following main factors determine the success of any MT implementation.
- Expectations Is MT expected to improve productivity, to reduce the time-to-market or for localization of otherwise un-translated content? The desired use of MT determines what specific type of service Moravia will provide, and which content should be addressed by the pilot project. The main consideration is the quality of the final output, and if post-editing after MT is a required step (and if so, the extent of such post-editing-so-called "light" post-editing vs. "full" post-editing).
- Domain and content type Some content types are especially appropriate for Machine Translation, for instance customer support content or user documentation. We see that the use of MT on software user interface is on the increase, in specific situations. Marketing communications materials and other more creative content are typically not appropriate for MT, unless MT technologies are applied to get a "raw" output to provide a general intent.
- Language pairs Some target languages and language combinations achieve higher productivity gains while others, in particular Asian languages, are still less amenable to the use of Machine Translation.
- Quality of source language Better source language control and application of suitable authoring tools will normally significantly increase the potential of MT. This includes use of established grammar and style rules and terminology.
- Availability of legacy content and/or client- or domain-specific corpus Especially in the case of the Statistical Machine Translation technologies which Moravia prefers, the larger the size of the quality legacy content that can be used to train the chosen MT engine, the higher the benefit that can be achieved right from the start.
The amount of bilingual content (in translation memory form) that is normally required for production-level customization can be up to 2 million words for European languages and up to 10 million words for Asian languages, although Moravia's ability to clean existing corpuses can reduce this requirement.
- Availability of customer-specific and domain dictionaries Similar to the legacy content or corpus availability, well-prepared and maintained dictionaries can significantly improve the quality of MT output, and by extension increase the cost benefits of MT use.
- Quality of the target language translation In addition to the Statistical MT engine training based on the bilingual content, the MT engine is trained using additional target language content for the sake of final "polish" of the translation.
Expectations and Process
The typical process we follow when we work with our clients to assess if Machine Translation can help them achieve their corporate objectives includes:
- Setting-up a customized MT engine, selected by Moravia based on our knowledge of partner strengths, trained on the specific customer's data set that is available. Customers provide bilingual content for customization of the MT engine, and Moravia analyzes and prepares TM/bilingual content for initial customization of the engine.
- Selecting sample content for MT.
- Agreeing on quality and productivity measurements.
- Determining the quality of the MT raw output.
- Identifying any recurring issues in the MT output.
- Retraining the engine as necessary.
- Conducting post-editing and identifying the type of post-editing and achievable productivities.
In the end, customers receive a report on the overall quality of the customized engines and a detailed plan for actual real-life MT production deployment.
The Human Factor
The human factor is key to driving quality. Some percentage of the corpus needs to be human-verified upfront and the higher this percentage and the greater the effort going into this verification and cleanup, the greater the ultimate quality of MT output.
Data cleaning is an important and frequently overlooked aspect. The data included in existing and selected translation memory databases is not always in perfect shape. It needs to be normalized, standardized and cleaned for best results. To do this, an effective combination of linguistic and data processing skills is needed, which Moravia can provide.
Measuring the Quality of MT Translation
The quality of the customized MT engine is measured by tools that compare the output from the MT engine, on a sample which was not included in the training, with a human translation of the same content. This content is called a "test-set" and the human translation of test-sets is widely known as "reference translation".
To ensure the objectivity of quality measurements of the customized MT engine, the test-set content has to be separated from the data used for the customization of the engine. If the test-set data were not separated and used for engine customization, the system would use the same (or very similar) translations for the test-set and the results would be affected (quality would be much probably better for such test-sets).
We use three metrics for the evaluation of MT engine translation output quality:
The results measure the similarity between the reference translation and the MT output expressed as a percentage.
All metrics compare human translation (the reference set) with the machine-translated set of segments (test-set). It is important to select an appropriate test set for each machine translation engine, following these rules:
- The test-set should contain a representative set of data which will be translated in production.
- Data in the test-set should not be contained in the corpus used for training the MT engine (the training set).
- The test-set should ideally contain full sentences, not one- or two-word phrases.
- The test-set and the reference set must be perfectly aligned.
For a proper assessment of how usable Machine Translation is for a specific purpose, good preparation and managing expectations upfront is important. With these caveats in mind, the potential of Machine Translation can be successfully unlocked.
Learn more about our linguistic MT post-editing services and our comprehensive MT deployment solution. To find out how Machine Translation could be used to meet your specific objectives, please complete the Request for Information form and we will get back to you shortly, or see other options for contacting us.