30 September 2005

English words are not always that "English"

In an article published in the Financial Express author Christopher Caldwell explains why English is "the international language of choice" and how it is "creeping into the very fabric of all other languages".

The dominance of the English language, especially when it comes to business, politics or new technologies, cannot be denied neither do we have to look very hard for occurrences of English words in other (non-English) languages. Yet commentators on the supremacy of the English language tend to forget that English is mainly of Roman-Germanic origin and, therefore, has itself a fair share of vocabulary borrowed from other languages.

In his article "The unavoidable language" Caldwell cites the German politician Guido Westerwelle, chairman of the Free Democrats (FDP), who described the coalition discussions as having been "sehr fair". The author found it telling that, "at a moment of drama and tension, Mr Westerwelle should use English words in full confidence that his fellow citizens will understand him." And they do - the word is even listed in the DUDEN, the German equivalent of the Webster or Oxford English dictionaries, as meaning "according to the rules". It has found its way into German and is frequently used.

From a linguistic point of view, the word is of Old English origin (just like the words "meeting" and "leader", which were also given as examples) and it is, therefore, "fair" to refer to it as an English word. But what about the other examples given in the text? "The word "hamburger" "supplants no French equivalent," according to Caldwell. Nor does it have an English equivalent because the food it refers to is named after the German town of Hamburg - for whatever reason.

And what about the word "budget", which, as no surprise to the author, can be found in French dictionaries? Well, you can also find it in German dictionaries - with a French pronunciation because it is a French word, which was introduced to the English language during the Middle English period (ca. 1100 AD - 1500 AD).

Despite the fact that English has become the world's leading language in the areas of scientific research, technological development, business negotiations and political affairs due to colonialization and recent political and economic developments one should not forget its origins. Since people first started migrating and dealing with their neighbours different languages and cultures have been influencing one another and left their traces. This holds true for English during the past centuries but also for Latin, which influenced every other modern European language during the times of the Roman Empire or for Greek during the times of Alexander the Great.

The article rightfully points out the rise of the English language to a Lingua Franca (also because it is "fantastically simple as a pidgin"), yet the examples given to demonstrate how English words have become part of the everyday vocabulary of other languages merely reflect the divers origin of the English language itself. Why should, therefore, no other language now have its turn in borrowing from the English language?

The article can be found in The Financial Express.

Международен ден на преводача

Днес, 30.09.2005, е Международният ден на преводача.

Светецът-покровител на тази професия е св. Йероним (347-419), който през 4 век сл. Хр. е превел Библията (Вулгата - "общоприето издание") от гръцки на латински. Преводът му е плод на 30-годишен труд!

Международната федерация на преводачите е избрала "Преводът и човешките права" (Translation and Human Rights) за тема на тазгодишния професионален празник.

Честито на всички колеги!

MultiTrans 4 Released


MultiCorpora, a provider of software solutions for translation support and language management, unveiled MultiTrans 4 to the language industry at the AILIA conference held in Ottawa on September 16, 2005. MultiCorpora will take MultiTrans 4 to the market with its worldwide marketing campaign, expected to last through December 2005.

MultiTrans 4 provides access to previously translated texts that, in conjunction with top-quality automatic alignment and search tools, become an essential knowledge support asset. MultiCorpora believes that combining these capabilities in a user-friendly CAT tool, which can be adapted for different types of user, will give everyone the resources and time needed to maintain quality while performing the art of translation.

29 September 2005

Bulgarian Invents Unique Translation Method

A unique translation method that allows people to communicate in real time using different mother languages has been invented by a Bulgarian man.

Koycho Mitev's invention uses digits to record speech and automatically transfer it into whatever other language.

It enables people speaking in different languages to hold real-time conversations, the state Bulgarian National Radio said in its website.

The unique method has earned Mitev an award at Inpex, the largest American innovation tradeshow. It was held earlier this year in Pittsburgh, Pennsylvania.

Source: www.novinite.com

28 September 2005

Office 2003 Add-in: Word Redaction

Overview

Redaction is the careful editing of a document to remove confidential information.

The Microsoft Office Word 2003 Redaction Add-in makes it easy for you to mark sections of a document for redaction. You can then redact the document so that the sections you specified are blacked out. You can either print the redacted document or use it electronically. In the redacted version of the document, the redacted text is replaced with a black bar and cannot be converted back to text or retrieved.

Sensitive government documents, confidential legal documents, insurance contracts, and other sensitive documents are often redacted before being made available to the public. With the Word 2003 Redaction Add-in, users of Microsoft Office Word 2003 now have an effective, user-friendly tool to help them redact confidential text in Word documents.


Document processed by Word Redaction

Tips for using this add-in are available from Work Essentials, a resource that provides free occupation-focused expert advice, demos, templates, and webcasts.

Notes:

● In a redacted document, the black bar that replaces the redacted text takes up the same amount of space as the original text so that line spacing and line breaks are unaffected. As a result, readers may be able to determine the length of a redacted word based on the size of the blacked out area. To help protect your redacted document from attempts to recover information by using word length, avoid redacting single words. If you need to redact a single word, you can replace it with a longer or shorter word before you select it for redaction.

● We recommend that you carefully review any documents redacted using this tool to confirm that all the information that you intended to redact was successfully redacted.

-------------
Word Redaction is a freeware and can be downloaded from here.

Новият Microsoft Office 12

През второто полугодие на следващата година (2006) корпорацията Microsoft има намерението да представи новата версия на своят офис пакет, известен към момента под кодовото наименование Office 12. По думите на основателя на Microsoft Бил Гейтс Office 12 ще бъде най-същественото обновление на съществуващата линия програмни продукти на компанията от излизането на Office 95 преди десет години.


Панел с инструменти в Microsoft Word

Основните усилия на разработчиците на Office 12 са насочени към опростяване на системата за навигация и подобряване на потребителския интерфейс. От Microsoft отбелязват, че с преходът към Office 12 по-лесно можете да се съсредоточите върху работата, вместо върху изпълнението.


Галерия със стилове

На лентата с инструменти в Office 12 се появяват икони на графически команди, улесняващи и ускоряващи изпълнението на едни и същи действия. Специалните галерии със стилове позволяват бързо форматиране на целия документ въз основа на един шаблон.


Екранна снимка на Microsoft PowerPoint

Office 12 ще разполага и с усъвършенствана система на контекстно меню. Целият интерфейс на Office 12 трябва да стане по-функционален и да е достъпен дори и за начинаещи потребители.


Контекстно меню

Измененията засягат всички основни приложения на Microsoft Office, в т.ч. Word, PowerPoint, Excel, Access и Outlook.

Излезе нова версия на ABBYY Lingvo

ABBYY Lingvo е най-пълния и съвременен електронен речник. Той дава възможност да получите точен превод на всяка дума. В неговите 46 английско-руски речника са събрани около 2,3 милиона речникови статии. Над 80% от речниците са издадени в периода 2003-2005 г. Речниковите статии съдържат транскрипция, тълкование, синоними и антоними, граматични коментари. Дадени са и примери на употребата на думата. Lingvo може да служи и за изучаване на езика - той съдържа справочник по граматика, фонетика и специално приложение за заучаване на думи - Lingvo Tutor. В новата версия Lingvo 11 сласните, носещи ударенията в думата, са маркирани в различен цвят. Електронният речник може да се инсталира и на джобни компютри (PDA) и смартфони.

Новото в ABBYY Lingvo 11!

> Нов руски тълковен речник - Ефремова Т.Ф., 2005, 136 000 статии



> Милиони словни статии в 23 речника

10 нови английски речника:
Английско-руски и руско-английски автомобилен речник
Английско-руски и руско-английски речник по строителство и нови строителни технологии
Английско-руски и руско-английски речник по машиностроене и автоматизация на производството
Английско-руски и руско-английски речник по физика
Нов английско-руски медицински речник
Английско-руски тълковен речник по винопроизводство

1 нов немски речник:
Австрия. Енциклопедичен речник.

5 нови френски речника:
Речник на бизнесмена (френско-руски и руско-френски)
Нов голям френско-руски фразеологичен речник
Речник по нефтена и газовата промишленост (френско-руски и руско-френски)

5 нови испански речника:
Испанско-руски и руско-испански юридически речник
Испанско-руски и руско-испански икономически речник
Испанско-руски речник на съвременната употреба

2 нови италиански речника:
Италианско-руски и руско-италиански икономически речник

> 14 обновени и преработени речника:
- Английско-руски и руско-английски речник на общата лексика
- Английско-руски и руско-английски научно-технически речник
- Английско-руски и руско-английски речник по изчислителна техника и програмиране
- Английско-руски тълковен речник по финансово управление
- Английско-руски тълковен речник по финансови пазари
- Английско-руски тълковен речник по счетоводна отчетност и одит
- Английско-руски тълковен речник по управление и икономика на труда
- Английско-руски тълковен речник по маркетинг и търговия
- Английско-руски енциклопедичен речник «АМЕРИКАНА II»
- Немско-руски и руско-немски медицински речник

> Нова функция - превод на изречения дума по дума

> Показване на ударението в думата - гласната, на която пада ударението е маркирана с цвят



> Усъвършенстван е модулът Lingvo Tutor - добавени са 6 нови теми на учебните речници и са добавени нови функции за изучаване

> Lingvo 11 може да работи на смартфони - работи на телефони с операционна система Windows Mobile® и Symbian OS™ (Series 60)

27 September 2005

Counting Words and Characters

Tim Watson

Word and character counting is a subject close to the heart of all freelance translators, as it’s the basis for job costing and getting paid. This article considers some of the issues involved in word counting.

Different word processors and translation tools very often produce different word count values for the same document, though typically not wildly different. The differences can be due to the use of different rules for counting as well as deficiencies in the applications used.

Many people rely on the document property statistics produced by Microsoft Word to determine the word and character counts. In many instances this is perfectly good. There are, however, a few things to be aware of that Word gets wrong, as I will explain.

When one is handling a large number of documents at a time, getting an overall word count for all documents can be time consuming, especially if this means opening several documents in Microsoft Word, noting the count values for each file and then totalling them all together. There are third-party tools to automate the process of counting words. These allow a number of files, which may be of different formats, to be selected, and the word/character counts are then summarized and totalled. When one is faced with many files, these tools are real time savers. For example, when one is working with Web pages, it’s quite common for a customer to supply dozens of separate files. The utilities typically support multiple file formats such as Word, HTML, PDF, PowerPoint, Excel, text and so on. These dedicated word counting tools can also be more accurate as they don’t have the deficiencies that standard applications such as Microsoft Word have. The table “Word Counts From Three Applications” shows the word count from three different applications, including Microsoft Word, for a set of test documents.

Readers who wish to try the test documents on their own systems may download them from www.surefiresoftware.com /testdocs.htm. Scanned and electronically faxed images are another matter. These will typically be in bitmap (.bmp), .jpg, .gif, .tif or some other graphical format. Acrobat PDF documents or Word documents may also contain scanned images. Text in a scanned image is not stored in the form of a character encoding, but is described like a picture and is made up of colored dots or pixels. In order for a computer program to count words, one must first convert the graphical image back into a character encoded format, such as Word, rich text file (RTF), text and so on. This can be done with the aid of an optical character recognition (OCR) application. Several OCR applications are commercially available.

Counting in Counting in Word

Let’s now consider Microsoft Word in more depth and look at the areas where caution is needed.Word basically counts words by assuming everything between spaces is a word. This includes symbols such as %, &, @, * and #. Translation tools are generally a bit smarter and will not include these symbols as words.

Text from text boxes, grouped shapes, auto-shapes, headers, footers and comments are not included in the Word-generated document statistics. Headers and footers usually contain little text, so the error introduced by Word from ignoring this text is minor. The use of text boxes can be more significant. Some document authors use many text boxes, particularly to annotate drawings or to help produce complex text layouts. In these cases, ignoring this text can produce large errors, causing the word count to be far too low.

Microsoft Word counts numbers as words. For example, 4.7 would count as one word. Some other packages may exclude numbers from the word count. General opinion seems to be split on how to consider numbers. Some say that because numbers don’t need to be translated, they should not be included. Others say that because numbers need to be transcribed and checked for errors, they should be included. The difference is typically not significant for documents that contain only a few numbers.

Word does not count the text contained in any embedded objects. These objects, sometimes also known as OLE objects, are inserted into a Word document through use of Word’s Insert menu and the Object… item. For example, an Excel worksheet can be embedded within a Word document. Inserted OLE objects in Word documents are often diagrams with little or no text; but this is not always the case, and caution is needed. For example, an embedded Excel worksheet may contain significant amounts of text.

Using Microsoft Word to open HTML files and provide statistics needs some additional care. If the HTML file contains a form with predefined options for a drop-down type combo box, then Word will not count the predefined drop-down text options. When the HTML contains forms, this can lead to the word count being significantly lower than the truth. The Word statistics also do not include the HTML page title, button text, and text in meta tags such as meta tags for description and keywords. Scanned images — text that is part of a graphic, very often buttons — will also not be counted.

Counting in PowerPoint

In common with Word, PowerPoint does not count the text contained in OLE objects, which are commonly used in PowerPoint presentations. Microsoft Word tables can be easily inserted as embedded objects, using the PowerPoint Insert menu, Picture sub-menu. Excel worksheets are also commonly embedded into a slide. When embedded objects exist, they typically contain significant amounts of text, and this should be taken into account manually.

PowerPoint 97 and 2000 are not consistent with Word in the rules used for counting.

For example, hyphenated words are counted as two words. PowerPoint XP corrects this difference. This means that two different users with different PowerPoint versions may disagree about the word count on the same document. PowerPoint, of course, doesn’t provide character count statistics. A third-party tool must be used for this purpose.

Summary

Understanding the tools available and the shortcomings of different approaches to word and character counting is important. Minor word-count differences are probably not worth getting hung up on, and a pragmatic approach is sensible. A few words make little difference to the overall time for translation; it is far more important to consider carefully the type and difficulty of the material. This, of course, is an altogether more skilled task.

Internet-based Sharing of Translation Memory

New solutions that are changing translation and localization workflows

GARRY LEVITT
(Garry Levitt is a localization project manager at LinguaPoint. He can be reached at garry_levitt@linguapoint.de )

Internet-based translation memory (TM) sharing will constitute the next big cost-saving improvement to localization workflow since the introduction of translation memory itself. This article explains the extent of the changes to current localization workflow that will be brought about by the widespread introduction of this new feature.

TM Sharing Explained

TM sharing enables two or more translators to translate the files of one and the same project, while using the same TM in order to retrieve previously translated segments — also called "fuzzy matching." We can distinguish between two different types of TM sharing. First, a Master TM can be shared internally over the vendor's own local area network (LAN), which would probably be the case if several in-house resources were working on the same project. This procedure is supported by various translation-tool packages such as TRADOS Team Edition 6 or TM Server.

But with tools such as Logoport and T-Remote Memory, several translators in different locations can also share a single TM. According to the developers of T-Remote Memory, Telelingua Software, their application is not another TM software package but an add-on that can be used in conjunction with any existing TM solution. Translators only require an Internet connection in order to connect to a "communication server," which transmits and collects all data provided by the connected TMs. This second type of TM sharing, also referred to as telesharing, represents the latest innovation in translation tool development. A single TM (and in the case of T-Remote Memory, one or more TMs) is fed with the translations of different translators. As the TM grows, translators can not only call upon their own translated segments in the TM, but also upon the translations of the other translators involved on the project — in real time.

TM Sharing or Multi-user Functionality?

In reality, Internet-based TM sharing had been around for quite some time before Logoport and Telelingua decided to market their solutions, albeit in a slightly different form. ForeignDesk is a translation tool that has provided support for TCP/IP-based connection of translators for a number of years through its "multi-user functionality." The difference with ForeignDesk, which has recently been released by Lionbridge as open source, lies in the basic functionality of the tool. There is no use of TMs, but of "projects" that already contain legacy material that has been leveraged from previous projects. Yet the basic principle remains the same. Translators can connect their project with the projects of the other translators via TCP/IP and can thereby share their translations. The main drawback with this type of telesharing is that all translators involved will have to be on-line and working at the same time in order to achieve the greatest benefit. This almost rules out a situation in which translators from totally different parts of the world can work together on one project. Unless, of course, they pay a flat rate for their Internet connection and can leave their computers on overnight, which will coincide with someone else's working day in another time zone. It could be argued that this particular tool is more suited for regional teams of translators rather than global teams.

TM Sharing vs. TM Exchange

The benefits of TM sharing can best be illustrated by first explaining the workflow of a localization project without the use of this technology. The increasingly short turnaround times for localization projects — necessary if we are ever to reach anything close to real simultaneous shipment of localized products — requires a language services provider (LSP) to divide a project up among several freelance translators, assuming the necessary resources are not available in house. Otherwise, the first type of TM sharing, a network TM, could be used. In order to limit the risk of inconsistencies creeping into the localized texts, translators are asked to exchange their individual TMs or translated bilingual files.

The process of TM exchange includes the need for translators to then import these memories into their own TM as they would with TRADOS. If they are exchanging translated bilingual files, all they need to do is update their TMs with these translations by "cleaning" the files into their translation memories. This memory-synchronization step can be especially cumbersome if you are dealing with very large TMs. Furthermore, translators or editors are often faced with the task of changing their translation a number of times over in order to be consistent with the translation that other translators have used.

TM exchange can go a long way in ensuring a certain degree of consistency when several translators are involved on one project. TM sharing, on the other hand, can really come to grips with the issue of consistency throughout a project, as inconsistencies are unlikely. If another translator has already translated a similar or identical segment, the Master TM will provide this translation, thereby doing away with the need for "manual" TM exchange.

Client-side Benefits

There is an increasing demand for the deployment of network TMs to enable further cost reductions and to allow for faster time-to-market, especially where large projects are concerned. In the current economic downturn, clients are constantly seeking to reduce localization costs as budgets dwindle. Many refuse to pay for proofing of 100% matches and repetitions or demand reduced rates for certain fuzzy-match brackets. Against this background, TM sharing can be seen as yet another hole in the belt that is slowly being tightened around localization. As clients become more and more aware of the benefits of TM sharing and the use of network TM functionality in several current tools, they will also come to expect their localization partners to adopt the use of this technology. This should not be a problem for most multilanguage vendors (MLVs), but survival is becoming increasingly hard for some of the middlemen of the cascaded localization production chain, namely, the single-language vendors (SLVs) and smaller agencies, as the MLVs pass on this requirement to their subcontractors.

Currently, an MLV may require an SLV to call on the services of several translators when carrying out a large localization project. The division of files among several translators can result in a loss of fuzzy matches, as every translator will only be translating a part of the total project with his or her own stand-alone TM. Depending on how large the loss of fuzzy matches is, an MLV may agree to foot the bill for this loss as until now there has been no other alternative than to choose a conventional multi-translator approach without TM sharing.

TM sharing will soon become the standard, as MLVs will refuse to subcontract projects to SLVs who either don't have TM sharing incorporated into their localization workflow or who can not allocate the necessary in-house resources to the project. This will not only be because of the risk to the overall consistency of the project and the longer turnaround time, but because MLVs and clients will no longer be willing to pay for the loss of fuzzy matching, which would result from the use of stand-alone TMs. SLVs will have the choice between two courses of action. They can decide to integrate TM sharing into their localization workflow and show themselves to operate on the cutting edge of localization. Or they could choose to ignore the developing trend and view the investment in new tools and software licenses as unnecessary or postponable. This, however, is not an option as they risk missing the boat altogether when MLVs decide to look for other subcontractors. And if there is a market, someone else will soon be willing to oblige.

Share and Share Alike

But convincing translators to accept TM sharing as the right way forward may take time. Before the advent of TM some years ago, translators were paid a fixed word rate. Understandably, it took them some time to adapt to the idea of being paid a reduced rate for repetitions, 100% matches or fuzzy matches. However, translators have gradually fallen into line with the use of TM as the benefits of this technology became clear. So how will translators react to TM sharing? Some translators today already draw the line at a staggered delivery of their translated files with intermediate synching of TMs. They want to be paid in accordance with the initial analysis of the translatable files and do not want to see their workload and, consequently, revenue reduced by an exchange of TMs. For the sake of consistency, most agencies will agree to pay the translators for the initial word count even if the real word count is less through the sharing of TMs.

Other important psychological factors will play a role in determining a translator's acceptance of TM sharing. An element of mutual trust and professionalism is essential for the success of these projects. If, for example, we are dealing with a highly repetitive set of files, one translator may decide to delay work on the project until the first set of files has been translated by another translator, and, therefore, the first translator benefits from the likely additional fuzzy matching. Even if the project text is less repetitive in nature, the translator will probably have saved himself or herself quite a bit of terminology research, as key terms will have already been translated.

Match Made in Heaven?

Furthermore, translators will most likely not approve of having to base their translations on the "bad translations" of other translators that will crop up as fuzzy matches. They will either have to grin and bear the translations of colleagues in their own files for the sake of consistency or totally rewrite fuzzy matches that came from other people's translations. In the first scenario, translators might be risking their reputations because in-house editors will not know who made the original mistake. In the second scenario, translators are likely to need more time to complete their work if they are to correct the possible mistakes of their colleagues, for the TM will contain all the translations from various translators before the files have been proofed and edited. This is an important issue, as the translation turnaround time may have been increased at the expense of the actual translation quality. One mistake by a single translator could have repercussions in numerous other files that are being processed by other translators.

Calculating Payment

It still remains to be seen whether vendors and translators will get paid for a fixed word count as described in the analysis of their files. One thing that we can be sure of is that the advantage will shift more in the direction of the vendor and the client as we slowly move from mere TM exchange to full-fledged TM sharing. However, an important condition for this shift will be the capability to indicate objectively and in concrete figures the net contribution of every translator working on the project.

A way may soon be created to monitor a translator's work and progress during a given project, including the number of words and type of matches that have been translated. In most translation tools, each translated segment already receives a unique identification of the person that created the translation. On project completion, the Master TM could perhaps be filtered on the basis of a translator's identification, and a personal log could be created by means of a logging system. These logs would accurately reflect the real number of translated words per translator and could be the basis for payment.

This solution, however, would mean that translators would be continuously confronted with the problem of not knowing how long it will take them to complete the job and how much they stand to earn. This would depend on many factors such as the number of assigned translators, their individual speeds and the number, time and lengths of the breaks that they take. If one translator takes a break, the Master TM will have grown considerably and will influence the remaining workload for all translators — especially the first, who will have seen his allocated chunk shrink right before his very eyes. Furthermore, translators would no longer be able to divide up their own time, as the progress of the other translators would constantly be influencing everyone's workload and the required time until completion. This could certainly be a problem in texts with a high degree of potential fuzzy matching. Frequent progress reports will also have to be made available to the translators for their information, in order to prevent them from having to turn down projects as their planning becomes increasingly difficult.

How far can we afford to go in our bid to make translation more cost effective? Increased uncertainty is a high price to pay for translation cost reduction and may not be conducive to higher quality — consistency or no consistency.

Control and Security

The issue of increased control and security through TM sharing is interesting and requires further attention, as it may be a double-edged sword. On the one hand, translators will no longer have a copy of any valuable TM that they have been feeding their translations into. Translation capital is protected as the TM resides on a local server to which translators have only limited remote access. On the other hand, if this Master TM were ever to become corrupt outside of the LSP's office hours, translators — possibly in a different time zone — would not have a copy of the TM available to them to carry on working, as would have been the case with a stand-alone TM. Neither would they be able to try any function, such as the TRADOS "reorganize" function, which can often solve the most frequently occurring TRADOS TM problems. With TM sharing, this option would probably not be open to translators. They would have to stop working for the rest of the day until an engineer on the LSP's side could be contacted.

Also, with all of the agency's TMs residing on a server, this could spell potential disaster for a large number of projects in the event of server down time. But there would actually be no more need for the translators to send the translated files back to the vendor, as the translation could also just be created from scratch by pre-translating the source files with the updated Master TM. Translators would not be required to resend their files if these became corrupt or otherwise problematical. The LSP could simply automatically recreate the translation by translating the source files with the updated Master TM.

On some projects, translators are required to use special settings for their stand-alone TMs and are sent a list of instructions. With the use of TM sharing, any special settings can be carried out by the LSPs themselves without the need for any intervention by the translator and thereby avoiding any potential mistakes.

Key Solution Providers

Among the key providers of TM solutions that offer support for simultaneous TM access over LAN are STAR with its Transit XV, TRADOS with TRADOS 6 LSP and TM Server, Atril with Déjà Vu X and SDL with SDLX Enterprise Server 2003 and SDLX for UNIX.

While the various TRADOS 6 versions (Freelance, LSP and Power LSP) are targeted mainly at freelance translators and language service providers respectively, the TRADOS TM Server, the next-generation TM server technology and flagship version, is more suited to the needs of global corporations, although it is also used by service providers requiring the extra benefits of this client/server version. TRADOS also plans to introduce a version for the Internet by the end of 2003.
Déjà Vu X is the latest release of Atril's translation tool, which includes Editor, Standard, Professional and Workgroup versions. The differences among these versions include the number of translation and terminology databases that each can access simultaneously over a LAN. Atril is also working on a Web TM server (TM Remote Server) to enable Internet-based TM sharing. It is expected to be available in the second half of 2003.

In addition to the companies providing solutions with support for TM access over LAN, Logoport Software's and Telelingua's products also support database access via an Internet connection.

Telelingua's T-Remote Memory recently became commercially available. It allows users situated anywhere in the world to simultaneously share both TMs, terminology databases and MT systems, provided that these can be queried remotely via APIs or Web services. T-Remote Memory is available in Standard and Enterprise versions as well as in a Leased version that enables companies to handle unexpected peak work flexibly.

Logoport is a fully Web-based TM solution developed by the German company Logoport Software GmbH. Rather than directly purchasing the software, translators, agencies and other companies can lease an amount of time on the Logoport system at short notice. The leased capacity can be adjusted with the current volume of work, and users are charged for the net time that they have been using the system. A company can also acquire licenses to install the Logoport server on its internal network.

Conclusion

TM sharing has the potential to literally turn the translation industry upside down. We have witnessed the introduction of TM at a time when translators did not avail themselves of more than a computer and a word processing program and have now reached a point where translators in the localization industry cannot afford to work without a TM tool. The widespread introduction of TM sharing could reverse this trend completely. Many freelance translators will no longer need to own a TM tool themselves but will log on to their client's system and hitch a ride on the TM solution that the client has implemented.

All clients are interested in cost efficiency, quality and time-to-market. TM sharing can help clients and vendors realize improvements in all three areas. The localization workflow paradigm is set to change. Clients, vendors and eventually translators will welcome this change as it provides benefits for all and creates room for the language industry to expand even further. Translation is likely to be targeted for further cost reductions for quite some time to come. TM sharing can help relieve some of the tension surrounding price arrangements, which are slowly developing into a bone of contention among clients, MLVs and SLVs. Progress is inevitable, and TM sharing is as progressive as it gets.

Comparison of Key Features and Benefits
=========================================

Déjà Vu X Standard, Professional and Workgroup
--------------------------------------------------------------------------------
Nature of the product: CAT tool
Simultaneous database access over Internet: No
Simultaneous database access over LAN: Yes
Increased consistency when working with several translators: Yes, but only in-house translators
Centralized management of project configuration:
Wider choice of translators: Yes, through DVX Editor.
Increased control over intellectual property: No
Shorter turnaround times when working with translators in different locations: N/A
Software purchase required: Yes
Price (in euros): Déjà Vu X Workgroup: Server license and first workstation: €2250; Additional workstations: €1490 each
Editor: Native editor
Additional features: Depending on the version, simultaneous access to several translation memories is possible. The Workgroup version includes the TM Builder, a programmable API, and enables the creation of satellite projects that can be sent to freelance translators and edited with DVX Editor (free).
Further information: www.atril.com

T-Remote Memory
------------------
Nature of the product: Add-on for existing CAT tool
Simultaneous database access over Internet: Yes (TM system must support queries via APIs or Web services)
Simultaneous database access over LAN: Yes
Increased consistency when working with several translators: Yes, translators can be situated anywhere
Centralized management of project configuration: Yes
Wider choice of translators: Yes
Increased control over intellectual property: Yes
Shorter turnaround times when working with translators in different locations: Yes
Software purchase required: Yes. Leasing possible.
Price (in euros): Standard version: €2100 (for 3 users, minimum)
Enterprise version: €8550 (for 10 users), available from July 1, 2003
Leased version: €1710 for 10 users for 3 months (minimum period)
Editor: MS Word
Additional features: Share several translation tools at the same time (translation memories, terminology databases, machine translation systems)
Further information: www.telelingua.com

Logoport
-----------
Nature of the product: CAT tool
Simultaneous database access over Internet: Yes
Simultaneous database access over LAN: Yes
Increased consistency when working with several translators: Yes, translators can be situated anywhere
Centralized management of project configuration: Yes
Wider choice of translators: Yes
Increased control over intellectual property: Yes
Shorter turnaround times when working with translators in different locations: Yes
Software purchase required: No. Lease access time.
Price (in euros): €.70/hour per user (discounts available for multiple users)
A Logoport server can also be installed on a company network; prices remain the same as for connecting to the remote server.
Editor: MS Word
Additional features: Context Matching, Logoport Messenger, terminology management, file format converters
Further information: www.logoport.net

TRADOS 6 Power LSP, TRADOS TM Server
-------------------------------------
Nature of the product: CAT tool
Simultaneous database access over Internet: No
Simultaneous database access over LAN: Yes
Increased consistency when working with several translators: Yes, but only in-house translators
Centralized management of project configuration: TRADOS 6 LSP: No
TRADOS TM Server: Yes
Wider choice of translators: No
Increased control over intellectual property: TRADOS 6 LSP: No
TRADOS TM Server: N/A
Shorter turnaround times when working with translators in different locations: N/A
Software purchase required: Yes
Price (in euros): TRADOS 6 Power LSP: €2595
TRADOS TM Server: no price indication provided by TRADOS
Editor: MS Word, plus other native editors
Additional features: TRADOS 6 LSP: Data mining tool, front-ends, terminology management, file-format converters
TRADOS TM Server: ConteXT TM optional
Further information: www.trados.com

SDL WorkFlow, SDLX Enterprise Server
-------------------------------------
Nature of the product: Multilingual content management system, CAT tool
Simultaneous database access over Internet: SDL WorkFlow: Yes.

SDLX Enterprise Server: Yes, via an Internet/Intranet Virtual Private Network (VPN) connection. At the end of this year, it is planned to implement this as a web service.
Simultaneous database access over LAN: Yes
Increased consistency when working with several translators: Yes
Centralized management of project configuration: Yes
Wider choice of translators: Yes
Increased control over intellectual property: Yes
Shorter turnaround times when working with translators in different locations: Yes
Software purchase required: Yes
Price (in euros): SDLX Enterprise: $1,000-1,500 per user (price is tiered and based on the number of users)
Editor: Native editor
Additional features: SDLX Enterprise Server: SDL Align, SDL Project Wizard, SDL Termbase, SDL Maintain, SDL Apply, SDL Analyse, SDLX AutoTrans and others.
Further information: www.sdlintl.com

Avoiding a US-centric Writing Style

Laurie Kamerer

A bank in China recently purchased 400 copies of a software package from Cisco Systems, Inc. Neither the software nor the associated documentation had been localized, but bank management was not worried since “everyone speaks English.” Unfortunately, the bank’s personnel did not speak the long-winded, US-centric brand of English that appeared in the online help system. As a result, the CDs sit on the shelf while customers complain.

In a perfect world, all of our products, Web sites, documentation and marketing collateral would be localized for every target locale. In reality, customers often have to make do with the English versions. In either case, writers play a crucial role in serving global customers. One of the biggest contributions that content creators can make to their global audience is to eliminate US-centric references and biases.

Translations can be completed more efficiently and with fewer errors when their source materials are culturally neutral. When end customers must grapple with the English themselves, eliminating cultural bias becomes even more critical. Creating culturally neutral and therefore more comprehensible material hinges on two principles: avoiding culture-specific references and avoiding location-specific references.

Principle 1: Avoid culture-specific references. This kind of reference, which assumes the reader is familiar with American idiosyncrasies and customs, can even confuse native English speakers from outside the United States. Sports metaphors all too frequently creep into US business parlance and bewilder non-US speakers of English. Puns, jokes, idiomatic language, colloquialisms and jargon all fall into this category.

In the following example, the simile assumes that the audience is familiar with the American cultural landscape: “The Economy is very much like Oklahoma just after the land rush. The land closest to the border has been grubstaked, and many of the most convenient and conspicuously valuable lots have been fenced to keep interlopers out” (marketing material from a Cisco Systems, Inc., Web site).

Sports metaphors, especially baseball metaphors, should be eliminated at every occurrence. Not only do sports generally play a lesser role in other countries than they do in the United States, but baseball specifically (often cited as the pinnacle of Americanism) is especially unfamiliar to many non-US readers. The following statement, overheard at a sales meeting, might baffle attendees from outside the United States: “It’s great that our sales team made a diving catch on this one, but next time we should develop a game plan that does not necessitate such ninth-inning heroics.”

Colloquialisms can trip up even the best translators. Some of George W. Bush’s cowboy westernisms have made translators at international media outlets scramble. As reported in a recent National Public Radio piece, Bush stumped the foreign press with his recent challenge to Iraqi attackers. “Bring ’em on,” he said. The Kyoto news agency in Japan opted to translate this as follows: “Come on. If you’re courageous enough to attack us, just attack us. We are ready to defeat you.”

After struggling with translating “Bring ’em on,” the Arab news agency Al- Jazeera chose not to quote the president at all, but instead to paraphrase.

Speeches and presentations can be especially daunting for non-US speakers of English. In addition to understanding English, the audience also has to keep up with the speaker. If audience members receive slides or an outline in advance, they will be able to better focus on and comprehend the content.

Principle 2: Avoid location-specific references. Things that are not standard internationally, but vary based on the reader’s geographic frame of reference, can be considered “location-specific.” Common culprits in this category are time and date references, units of measure and references to specific laws and tax codes.

CNN reported on one particularly expensive example of this type of problem in 1999. NASA had lost a $125-million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency’s team used the more conventional metric system for a key spacecraft operation.

While time and date formatting might be obvious examples of location-specific considerations, a domestic reference like this might not immediately raise a red flag: “Customer support representatives are available 8 a.m. – 5 p.m. every day except national holidays” (Contact page from ecommerce Web site).

In addition to “What time zone?” the reader will have to ask himself “Whose nation? What holidays?”

In some cases, these types of references not only confuse readers, but can be much more insidious. One software vendor thought that the company would be able to “globalize” fairly easily by translating on-line marketing materials for its target market in Europe. The software enabled the automation of a number of human resources functions, including Web-based processing of 401(k) and W2 forms. The marketing content had gone through initial translations when one editor, based in Europe, asked, “What is 401(k)?” Not only did the marketing materials fail to speak to the target audience, but the functionality did not speak to their national tax and pension codes.

A British comedian also touched on this principle when he asked a retired British judge whether, when charged with a particular crime, he should plead the fifth. “That won’t do you any good,” the judge responded. “The fifth amendment is part of the American Constitution and has no legal recognition in the UK.”

The application of these guidelines must vary depending on the nature of your content. Marketing material is not always suited to a perfectly literal style, whereas technical documentation is best if it is dry and unadorned. In any case, writers won’t leave their global audience stranded on second base if they consider these guidelines when developing content.

Working With Translation Memory

When and how to use TM for a successful translation project

STEVE IVERSON
(Steve Iverson is president of Iverson Language Associates, Inc. He can be reached at steve@iversonlang.com)

Translation memory (TM) has become a common term in the translation and localization industry and has led to confusion on the part of clients and service providers alike. There is confusion over what the term means, what the difference is between TM and translation software and when it's appropriate to use TM.

TM belongs to the computer-assisted translation family of tools. In essence, TM tools do not translate material, but assist the human translator to produce a better (that is, faster and more consistent) translation. TM, simply put, keeps track of and allows the reuse of information that is repetitive, both within a document and across a series of documents. And since it works at the segment level, it is different from glossaries. Widely used tools available today include TRADOS, STAR Transit, Déjà Vu, SDLX and others.

Why Use TM?

Without TM, the translator either relies on a document where he or she keeps track of previously translated text or, more often, relies on memory. In a society where quality and consistency continue to be important issues, relying on already overloaded brains can be troublesome. There are a number of valid reasons for considering the use of TM. One is to help maintain consistency. Phrases and sections that repeat should not be re-translated.

If translations can be reused, there is less to translate. This implies faster turnaround time on projects. Despite education efforts to the contrary, it is often believed that good quality translation can be produced at increasingly greater speed. This drive is understandable, since turnaround time of a translation often has a great deal to do with how quickly a company starts making money on its product in international markets.

Probably the most important benefit from the use of TM is the ability to maintain control of translation/localization costs. With the reuse of translated material, there is less text to translate, resulting in lower project costs.

An often overlooked benefit for using TM tools from a service provider perspective (translation agency or freelance translator) is that often the resulting files are easier to work with. Certain types of tools (open-tag TM tools) actually hide coding in files such as HTML so that there is no risk of accidentally deleting or altering the codes and tags.

TM tools can often also simplify the desktop publishing (DTP) process, requiring only clean-up at the end of the project instead of the traditional cutting and pasting of the translated text into the English template.

Finally, TM can become a component of a content management system, allowing you to manage and maintain the foreign languages along with the English. Content management is being adopted by many companies with extensive publications departments and dynamic Web sites. It implies finding ways to keep track of what's been written, usually with a focus on English. However, content management refers to managing content, whatever the content is and in any language.

How Does It Work?

There are three ways to use TM tools. The easiest is to start from scratch, not trying to incorporate legacy material (previously translated documents). You begin by translating your material with an empty TM. Once the text is finalized, it is put into the database of the tool you are using. The next document to translate is then electronically compared to the database, and the results are reported in terms of no-match, fuzzy match and 100% match. A fuzzy match is a phrase that is similar to the new source language text but requires some editing to make it match exactly. Once you have built your database to a fairly high level, you will begin to see more fuzzy and 100% matches and fewer no-matches. How long it takes to get to this point depends on the volume of material you are translating and on your internal reuse of existing segments.

Second, you can populate the database of your TM tool with material from your legacy documents. This may build your database quickly, but involves the risk of introducing errors into the database which will then be used in many other documents. And even though it is possible to correct entries in the database, it can be time consuming. So make sure that before you use your legacy documents in this way, the language is exactly the way it should be. It is often helpful to have your legacy material proofread for accuracy, and to make sure that the text is in the same order as the source text.

Third, you can translate documents with a lot of internal repetition. If the document has a lot of text within itself that repeats, you can reduce the amount of text for translation by exporting frequent segments, translating them once, and then using those translations to pre-translate the file before sending the complete document to the translator. The frequent text appears in all the places it should, and the translator translates what's left over.

One key issue for TM tools is the type of file formats that can be used with them. Some are fairly limited (only Microsoft Word documents), while others have a fairly broad range of file formats that are compatible with the software. Before you select one of the available tools, make sure that it has broad enough compatibility with the file formats you require.

Will It Work for Me?

The question of whether TM is appropriate for you and your documentation can be difficult. A key first step is to determine what you hope to achieve from using TM: to save money? to improve consistency? to reduce turnaround time? to reduce formatting work? Here are three criteria for a quick evaluation.

High degree of repetition. The more repetition there is in your documentation, the more benefit you will see from the use of TM. This repetition may be within one single document or between several documents. There are times when using a TM tool to help manage one document has benefits, but most often you find repetition between documents related to the same products, or family of products and updates.

High volume. Obviously, the more volume you have, the larger the database will become over time. This helps compensate for the cost of creating and maintaining your database.

Compatible file formats. You must have file formats that either allow you to work directly in the TM tool or have a way for you to extract the text needed.

If your documentation meets those criteria, you are probably a good candidate for the use of TM tools.

How Much Does It Cost?

If you have decided not to purchase the software and manage it yourself, your service provider will be charging you for the translation and/or editing of the text. However, the way you are charged will probably be different. Once you have a database started, any projects that you choose to translate will be analyzed, and the software will provide statistics on the number of 100% matches, the number of fuzzy matches and the number of no-matches.

The no-matches will probably be billed at the normal per-word rate that you always pay. Fuzzy matches are typically billed at a fraction of the rate for no-matches (often 50%), and 100% matches may be billed at a still smaller fraction (often 25%).

Although it is commonly accepted that the 100% matches do not need to be reviewed, thereby allowing you to save more money, there is also an argument for reviewing them. Since the computer is doing all the work, it's wise to make sure that there has not been a computer glitch, resulting in a mistake in matching segments. In addition, proofing the 100% matches allows you to make even more improvements in your memory. While it's important to minimize changes to 100% matches for the sake of higher repetition, there will be times when you will still need to carefully revise segments in your database. Often, these are most obvious when you review them in context, and reviewing the 100% matches gives you that opportunity.

In addition to the per-word charges, you may be charged for the time it takes to process your files and update the memory. This is normally billed at an hourly rate, agreed to in advance of the project. The cost is normally fairly small. Most clients find that even with the additional billing, the reduced translation cost more than offsets this extra item.

If your service provider offers DTP, the files will need to be "cleaned up" to accommodate the additional space required by the target language. The cost for this service should be less than traditional formatting "from scratch."

Who Owns the Database?

This is a widely debated question, and one that has not been answered conclusively. Some say that the database is a result of work contracted for and paid for by the client, so the client has a right to receive the memory files once the project is complete. The other argument says that without the creative work of the translator, the memory files would not exist at all. This is similar to having a marketing firm create your brochure, then charge you if you want the electronic files, a practice that exists in some areas. While there may not be one answer for all circumstances, it's best to be clear about this at the beginning of a relationship with any service provider and to have the answer noted in writing.

Making TM Work for You

Once you've decided to use TM, you will want to maximize the return on your investment. There are several things that you can do to make sure TM is the most effective.

Edit your English. Make sure it's as clean as possible, eliminating any awkward sentences and text that is no longer relevant to the product. Try to make it as consistent as possible, checking the use of multiple terms to describe the same thing and similar sections being written in the same way. The investment in editing your English before translation can yield some great results with TM.

Once you've edited one document, compare it to similar documents and sections of text to make sure there's consistency across documents. Work with the people who produce your technical publications to make sure they understand that how they write the documents has an effect on the cost and turnaround time of the translations. Often, little thought is given to the translation portion of the process.

Prepare your electronic files. Choose a program that meets your needs in terms of managing your documentation, but try to find something that interfaces well with your TM tool. Programs such as Adobe FrameMaker may work very well, or Microsoft Word might be the best option (although not from a DTP standpoint).

Be aware of how formatting of your documents might simplify the use of TM. If you are using a DTP program from which you need to extract the text, you might find that the English is split in the text file. For example,

Maximum
diameter

which would be translated as diamètre maximum, might become

Maximum Diamètre
Diameter Maximum

since it would result in a translation of each word individually, instead of as a phrase. Because of the hard return in the table, the extraction process produced two separate words/segments for translation, which would be incorrect. If this is not corrected, the memory will then contain an incorrect translation and result in a wrong translation — maximum translated as diameter.

With the increase in the volume of translation work being done in this country, TM can be one of the best tools to assist in the management of translations from all parties involved. And until a computer can think like a human, it seems to be the best choice for starting to automate the process. Talk with your service provider about whether you are a candidate for TM, and be aware of how the process works for you.


-----------------------
This article reprinted from #59 Volume 14 Issue 7 of MultiLingual Computing & Technology published by MultiLingual Computing, Inc., 319 North First Ave., Sandpoint, Idaho, USA, 208-263-8178, Fax: 208-263-6310.

Localizing Movies and Broadcasting

Subtitling and dubbing for multiple version releases

DAVID SHADBOLT
(David Shadbolt is a research editor with MultiLingual Computing & Technology. He can be reached at david@multilingual.com)

If any industry could claim global marketing success, it is the motion picture industry. While the 1939 movie Gone With the Wind was created for an American audience, directors of contemporary movies such as Gladiator (2000) have global release and merchandising sales as their goal. The official Web site for the 1997 movie Titanic even included, in addition to worldwide release dates, links to English, German, Italian and Spanish trailer clips. It is now commonplace for studios to release subtitled and dubbed movies in 36 languages or more.

International broadcasting networks also pump out subtitled and dubbed series to other national broadcasters, many of whom are multilingual. For example, Europe has 15 to 20 different multilingual channels broadcasting in up to 22 different languages.

"There are approximately 5,000 international broadcast and cable networks around the world," says Deeny Kaplan, executive vice president of Miami-based TM Systems (TMS), "creating a continuous demand for dubbing, subtitling and translation. Some countries have hundreds geared towards specific languages or specific audiences. Even a little country like Jordan broadcasts three languages a day, 24 hours a day on its television networks, and everything has to be dubbed or subtitled, whether it's a movie or news piece."

Markets

The need for subtitled and dubbed material has spawned a specialized language sector, particularly in countries with major international broadcasting networks. London alone has more than 30 organizations with roots buried deep within the industry. Europe's largest broadcasting vendor, Broadcast Text International (BTI) with its head office in Sweden, subtitles 40,000 hours of television, video, cinema and DVD content annually — the equivalent of 70 feature films a day, all year round. Bjorn Andersson, chief executive officer of BTI, says, "We existed for 20 years in different constellations doing subtitles for broadcasters before merging into this new group in 1998. We began subtitling in one country for one language but now do multilingual subtitles for broadcasters — for example, the British Broadcasting Corporation (BBC), first in Scandinavia, then Western Europe and now the growing Eastern European market."

BTI has also developed its client base in the DVDs and SDH (subtitling for the deaf and hard of hearing) sectors. SDH, better known as Closed Captioning in North America, has three categories: prerecorded programming, real-time for live programming and live display.

"SDH demand is really growing in countries like Germany, France, Italy and England." Andersson says. "The DVD market sector started five years ago and is very strong, which is one of the reasons we have recently opened an office in Santa Monica. Last year, the movie studios released about 6,000 DVDs, and it will probably remain around that mark annually until the studios deal with the backlog of movies in their vaults. In seven or eight years, these numbers will probably go down to around 3,000 per year. The DVD market is divided into local markets, for example, German companies releasing DVDs with SDH and maybe a couple of other languages, and the international DVD market primarily based in Los Angeles and London, which subtitles and dubs movies into 36 languages."

The movie versions take into account not just language but cultural considerations and viewing. "At a big studio such as Warner Universal or Paramount," says Kaplan, "you have a number of departments in charge of market niches — feature films, DVD, home video, broadcasting, even airlines. An airline version of an X-rated movie would need editing for general viewing because you might have your eight-year-old son sitting next to you."

Subtitling Workflow

Andersson outlines the workflow of a typical subtitling project: "Let's say our Santa Monica office receives a video from one of our clients such as MGM or Universal Studios. We encode the work into MPEG1 using a softel-encoding system purposefully built for video subtitling and send the result on to our in-country office where it is either assigned to one of our own 60 or 70 translators or a freelancer on the local payroll. That translator does the translation with the template he or she has received from us, and then the local office proofreads the work in real time, looking at the tape together with the subtitles to see if there is anything wrong with the translation. Then it's shipped back to Santa Monica where they make a final technical check to see if there are any technical problems. For example, it's very important that the subtitling doesn't cover chapter breaks and that the exact time codes haven't changed on these files since we sent them out. The final test is in real time, running the tape or the digital file against the subtitled file once more just to see if it's accurate. Before we send it back to the client, it's often converted into a bitmap or TIFF file."

One set of problems encountered in providing subtitles is the expansion and compression factor. A 66-page English script could expand to 86 pages in German and 96 pages in Spanish. Yoshie Anjiki, formerly a graphics technician at Studio 26 and now part of the Polarity Post Production (PPP) team in San Francisco has worked with a number of clients such as Albertsons, Wells Fargo, Bank of America, Chevron and Starbucks.

"We've encountered problems when a client has insisted on using its own translation company," Anjiki says. "Timing is always the key, and sometimes the client's translation is too long to fit in the lower third of the screen. A maximum of two lines for subtitles is ideal, or the viewer is too busy reading and can't comfortably watch the video. We clean up the translation and do line breaks that will prepare the translation to fit in the lower third of the screen."

Most dubbing studios use a linear, tape-based approach to dubbing and subtitling. A master tape in a professional format such as Betacam SP or Digital Betacam is used to play back in the studio for recording, while copies of it in a consumer format such as VHS are used by translators, casting directors and others to create the translated script, assign voice talent to characters, create cue lists for recording and so on.

"When we start a subtitling project," Anjiki says, "we ask our client to provide us with a beta SP or DigiBeta English master as a guide, and an un-keyed (clean or text-less) submaster as a source. According to the original version, we try to recreate it in ethnic languages by replacing show title or name keys when requested. If an un-keyed version is unavailable, then we have to squeeze or move subtitles to fit on the screen whenever there are any conflicting graphics in the lower third of the screen. There are many things we have to pay attention to — existing keys, background color, close-ups and so on. This is where the timed and precise translation comes in handy. It helps our editing process greatly."

Timing is just as important in the dubbing process. As Charles Xavier, localization director at PPP, explains, "A big portion of translation is the actual timing of the phrases. It has to work well because they usually don't re-edit picture for foreign languages. Automated dialog replacement (ADR), also called looping, replaces production sound recorded on the video with the voice of the talent (actor) recorded in the dubbing studio. Picking the right phrases to fit the timing for the voice-overs is a challenge. You have to find words that match the rhythm of the original language to fit the lip movements. Take English to German. If it's 11 seconds in English, too literal a translation might add an additional eight to nine seconds to the dialogue. With video games and some other products, exact matching of lip movement is impossible. We actually take out breaths and slide in words to make them match better. The phrases are edited to the picture once they are recorded to tighten up the phrases even more. It can be a challenge. Many games originate in Japanese, and we do the English version, and people have commented that our version looks almost better than the original does."

Technology

Reducing time-to-market for language versions is a priority for movies and television series as it is with software or other time-critical product. Kaplan at TMS says, "An episode of Friends shown tonight in the United States is seen in 50 other countries as a subtitled or dubbed version, so time is of the essence. In the movie business, the major studios have tiered levels that reflect their concerns and objectives for a given movie. A studio's first target is the group comprising France, Italy, Germany and Spain (FIGS), with Thailand now added for some reason. Increasingly, the majors want to include India, Israel and other countries, even worldwide same day release. A major animation movie has just received simultaneous release in 83 countries."

Broadcast news is even more time-sensitive. When the Voice of America needed the main evening news broadcasts of ABC, NBC, CBS, PBS and FOX translated to Arabic and subtitled daily, it turned to Applications Technology, Inc. (AppTek), which developed a fully automated process. Sami Shamma, responsible for business development at AppTek, explains: "This process required the translating and editing of three and a half hours of news, which is an average of 32,000 words, and a task we had to complete in six hours. We developed new tools alongside our machine translation engine to automate the insertion of time-stamped subtitles into the news footage. AppTek developed a workflow process that included voice recognition of the newscast and production of English text; machine translation of the text to Arabic; post-editing the Arabic text; auto-generation of time-stamped Arabic subtitles; and superimposition of subtitles on the original footage. As soon as we had finished, Voice of America had someone literally running to the aircraft so that it could get over the Gulf and broadcast the subtitled news footage."

Solutions from companies such as AppTek and from companies such as CPC, Screen Subtitling Systems and SoftNI (originally Software & New Ideas) have significantly reduced the time and cost of movie and broadcast localization.

SoftNI developed and released its off-line computer-based video subtitling system in 1986. The company provides a wide variety of services and software related to translation, subtitling and dubbing of motion pictures and television programs, including what it describes as the first software-based multichannel, multilanguage subtitling system for digital video broadcasting (DVB).

It was TMS, however, that received a technical achievement Emmy award for its fully integrated, non-linear language localization system for translation, dubbing and subtitling. The award is given to companies that make a contribution to television broadcasting that forever changes the face of the industry. Other winners include Apple Computer, Inc. (Final Cut Pro editing and finishing system) and the ARRI Group (ARRIFLEX Cameras).

Carlos Contreras, chief technology officer at TMS, says, "While there are differences in the user interface between our system and competitive solutions — as well as in the general approach to the actual subtitle file preparation — in general, all get the task done. One of our unique advantages, though, is that ours is the first comprehensive system for both subtitling and dubbing. Our scripts can even mix both formats in one single document. We also do away with proprietary file formats and go for open standards for video, sound and text files so that the user is never trapped into a proprietary technology. The learning curve is minimal as well."

The TMS system consists of four modules: PrepStation, TranStation, DubStation and SubStation. In the PrepStation the video is digitized in preparation for the translation process. "It is then encrypted, so even if someone has access to the actual file, it can not be played to get a copy of it," says Contreras. "Our players contain hardware keys to allow playback of the tracking file only on authorized TranStation or DubStation." Digital files enable access from a network or over the Internet, thereby reducing costs, delays in customs or loss and piracy that the current practice of making copies of a videotape or VHS cassette can cause when packaging and sending by courier to translators and dubbing studios.

The lightweight TranStation module, the least expensive component at $1,495, provides a transcription and translation function for both dubbing scripts and subtitling blocks. Major features include automatic single keystroke time code insertions and automatic loop or line counters. As the software provides pictures and text on a single screen, translators operate in a familiar Microsoft Word environment or compatible format, thereby simplifying the process of preparing the document for subtitling and/or dubbing compared to the process of starting, stopping and rewinding a VCR to find the exact place on a video cassette.

The translator can also create Closed Caption files with TranStation CC, a new component introduced by TMS at the 2003 National Association of Broadcasters event. "It is easily integrated into the existing TranStation as an upgrade or can be purchased as a standalone unit," says Contreras. "The TranStation CC contains features needed to create Closed Caption files and is comparable to other Closed Caption preparation suites, typically more complex and inefficient, some of them even still working with VHS tapes with time code soundtracks."

Upon completing the subtitles or dubbing files, the translator returns the file in a CD format or over the Internet to a dubbing or subtitling facility where it is proofread, loaded into the subtitling component, the SubStation, and burned in real time, in any color, style or character set. The fourth component, the DubStation, controls the recording session. "Linked to protocols or any audio workstation that reads time code, the DubStation contains an abundance of tools to assist in the production of scheduling and the administration process," says Kaplan. "A graph on the DubStation shows the time frame of each character and provides precise cues as to when the talent should speak the lines. In the dubbing recording suite, the actors stand on one side and the engineers on the other side telling the characters when to do their lines, while recording the voices directly into the computer."

WANTED! Post-Productions in Toronto negotiated the TMS sales agency for Canada after a successful experience dubbing a major children's television series. Company president John deNottbeck explains, "When a client asked us to do a Japanese-to-English conversion of a children's television series, we only had three weeks to deliver. Apparently, the original subtitling company had made a budgeting mistake in its bid and told the client it would now cost more per episode, quoting basically the same price for the 40-episode project as we had originally proposed. Although we only had three weeks rather than the original three-month lead time, we met the deadline after purchasing TMS, which requires only a short learning curve and operates instantaneously in computer time."

According to deNottbeck, "The TMS will eventually have competition from two other systems currently under development, but they have yet to be debugged. What we've discovered with TMS is that it's knocked off a third of our record time, so clients save on the costs of studio time and talent fees. Additionally, the system can print out the line and word count per character, which saves producers the headache of manually working out how much to pay the talent."

In Canada, deNottbeck sees TMS proving very successful. "Montreal has 15 studios operating non-stop, converting English content to French. They all use the Rythmoband system, developed in France almost 40 years ago. It's far slower than using a TMS system. Within two weeks of the talent working with TMS, the two companies using Rythmoband in Toronto were calling us looking for information."

At $30,000, TMS is relatively inexpensive. Kaplan says, "We met recently with an executive at a Miami-based international network. We were told the network had a room in which they intended to place a subtitling suite, and he was astounded when we told him our solution would only cost $30,000." Contreras explains, "A traditional voice recording room using standard playback decks and recording equipment, that could go well into the $50,000-$100,000 range, much more than our solution. The savings basically come from the fact that we are using standard PC software and hardware to replace high cost video and audio equipment."

The system has proven itself to a number of clients, beginning with Caracas-based M&M, which operated the software successfully for two years before the formation of TMS in 2001. The biggest installations are located at the localization studios of The Kitchen, Inc., in Miami, built for Claxson Interactive Group, a Latin American multimedia conglomerate. It contains eleven dubbing suites to cope with the volume of localization projects. Juan Bernardo Alvarez, The Kitchen's language conversion service head, claims the software cuts "translation time by half and significantly reduces the time it takes to convert a show like South Park into Spanish or other languages. If we have an emergency and need to rush out a show, what would take eight hours of recording in one studio we can now do in as little as an hour and a half using several studios at the same time."

Specialized Skills

Andersson at BTI says that companies not only have to understand the native language and culture but also the broadcasting and movie industry. "Localization is so much more than translating from one language into another, and the translation itself is a very specific type of translation, somewhere between the spoken and written word, and many translation companies did not do so well when they tried it," Andersson says. "A comprehensive translation is needed, as well as the elimination of narrative conclusion because the text needs condensing to meet the limitation of 70 characters per five or six seconds. It's a big task, but also an interesting task when you are translating. It's also the only time with translated content where everyone can compare both the source (dialogue) language and the target (written translation) at the same time. We hear about it in countries like Germany or The Netherlands if we get it wrong when an English movie is shown with subtitles."

BTI creates multilingual subtitles from 90% American English and 5% British programming with the balance from other languages. Andersson stressed that only by living in the target-language country can a translator make certain assumptions on how much his or her fellow natives know about a certain subject and how to translate it.

"For instance," Andersson says, "in British comedies, name-dropping is common, where the person named has certain characteristics. He or she may be frugal, generous, foolish, strong or whatever. In one episode of The Vicar of Dibley, we counted more than 60 names, and we had to research each name so that we understood the person's characteristics before making a decision on retaining or substituting that person's name. Some of the people don't mean anything to anybody in Iran or India or even in Sweden for that matter. Making decisions is easier if you live in the target country and not the source country like the United Kingdom or the United States."

Research also is important, according to Andersson. "While we don't have the experts on law texts or on technical texts or other topics," he says, "translators need to know a little about the many different things so that they can handle a drama series or comedy. If they don't, they have to find out, and sometimes just a couple of lines will take many hours of research to figure out what the lines actually mean. We may also omit Latin names of some species if it's a nature documentary, for example. It's much easier for translators to find the equivalent in their own languages. This applied to the Blue Planet series which we did for the BBC into many languages, among them Scandinavian. The fascinating thing with this series was that it had 350 different species of animals and plants. Some of them didn't have corresponding names in the Scandinavian languages, but through our local contacts at the Museum of Natural History, we named some for the first time. In Sweden, for instance, they added the names of 20 different species of fish. We have almost weekly contact with natural history people due to our BBC documentaries and National Geographic work."

Some television programs probably require too much editing to work in other countries — editing Friends for India, where scenes of holding hands and kissing is restricted, or the Middle East, where showing midriffs is prohibited — but one-off movies would seem more feasible. Directors can accommodate a broader range of cultural requirements by avoiding obtuse terminology and substituting visuals for verbal communication where possible. Misakes, though, are sometimes made, even by studio bosses. Think of the movie The Spy Who Shagged Me, the title of which raised many eyebrows in Britain where shag is commonly used as a substitute for the four-letter word beginning with f.

Errors of cultural blindness may occur less frequently as escalating production costs necessitate international sales and therefore the demand for localized content. Legislation in some countries will also drive the need for subtitling, dubbing and closed captioning content. In the United States, the Telecommunications Act of 1996 compels video program distributors (cable operators, broadcasters, and satellite distributors) to phase in closed captioning of their television programs in Spanish as well as English.

One can hope that the streamlining of subtitled and dubbed motion pictures and broadcasting will not result in the Hollywoodization of the world, but increase the diversity of content watched globally. globe



-----------------------
This article reprinted from #60 Volume 14 Issue 8 of MultiLingual Computing & Technology published by MultiLingual Computing, Inc., 319 North First Ave., Sandpoint, Idaho, USA, 208-263-8178, Fax: 208-263-6310.

Technical and Language Issues in ERP Localization

Adapting enterprise resource planning systems to the global marketplace


BERT ESSELINK


(Bert Esselink is a global engagement consultant at Lionbridge and the author of The Practical Guide to Localization. He can be reached at bert_esselink@lionbridge.com)


Localization service providers translating ERP software packages or SAP support had better make sure to use translators who know these domains inside out and should not rely on translators just looking at some glossaries. Localization companies now need to face these new challenges and higher customer demands. This was one of the conclusions in my article "The Evolution of Localization" (MultiLingual Computing & Technology #57 Volume 14 Issue 5, supplement "Guide to Localization").

Enterprise resource planning (ERP) is a domain where much translation is going on. SAP, for example, the leading provider of ERP systems, employs hundreds of translators who produce localized versions of the various SAP solutions. In addition to the in-house localization teams, SAP partners with localization vendors and freelance translators to ensure scalability or to access specific domain knowledge. More than 100 million words are translated each year, primarily consisting of software strings, documentation, training courses and support notes.

Many companies implementing ERP systems do so on a global basis, inherently making global business practices and support for multiple languages a requirement. ERP solution providers have been quick to realize the strategic importance of this requirement.

In this article I will introduce the complexities of global ERP deployments with a focus on the challenges of supporting multiple languages.

What Is ERP?

ERP systems attempt to integrate all departments and functions across a company onto a single system. ERP automates the tasks involved in performing a business process. An example is order fulfillment, which involves taking an order from a customer, shipping it and billing for it.

Most ERP systems feature components for finance management, logistics, human resources, manufacturing, customer relationship management, procurement and supply-chain management.

Decades ago, ERP deployments were primarily country-specific. Today, we frequently see global ERP deployments by multinationals who want to capture and standardize their business processes across the globe. One of the main business drivers for global ERP deployments is the centralization of data in a single repository to assess performance on a global scale rather than for each individual market. Increased globalization of business practices and value chains has called the need for a global overview of business data.

Examples of companies selling ERP software are SAP, PeopleSoft, Oracle and Microsoft Business Solutions (formerly Great Plains). SAP has the largest market share (approximately 25%), followed by Oracle, PeopleSoft, Sage and Microsoft.

A typical ERP implementation project has these phases: a fit/gap analysis to define business processes and the need for customization; system design and development; data conversions from legacy systems; testing of the new system; training of the super-users and end users; and deployment.

Most ERP implementations are coordinated by consulting or systems integration firms such as IBM Global Services or BearingPoint, at least in the initial stages. Consultants work with employees of the company that is implementing ERP to define and tweak end-to-end business processes, customize the system, develop new features if required and test the system for deployment.

Global implementations typically are first rolled out in a few pilot markets, followed by additional markets or groups of markets.

Global ERP Deployments

Global ERP deployments run, on average, between one and three years. The number of modules used and the global reach of the deployment are factors that influence the time it takes to implement an ERP solution. Some organizations decide to implement only the human resources solution globally, for example. Others implement the full ERP solution across a wide range of markets or regions.

The global rollout schedule is normally based on the system needs in the respective markets. Although language or translation is hardly ever considered as a critical factor during planning, some companies prefer to roll out new systems in English-speaking countries first to avoid the complexities of multilingualism.

Most ERP systems can also be tailored to the industry of the company that is implementing the system. SAP, for example, now has 23 distinct industry portfolios including automotive, health care, media and telecommunications.

A recent Gartner study outlined seven generic issues that should always be considered in a global ERP deployment project:

language — code page support and system translation;

currency
— dual- or multiple-base currency capabilities;

simultaneous operation of multilanguage and multicurrency on a single system — Unicode support;

statutory compliance — complying with local legal, taxation and accounting rules;

implementation services — regional consultant staffing and travel;

product support — local support or global 24/7 support; and

upgrade timing and availability — delays for language version updates.

Although the business in most organizations is structured according to product group, enterprise function or geography, in global ERP deployments there will be combinations of these dimensions.

What Is ERP Localization?

The term ERP localization is used in various different contexts: localization of the user interface text of the ERP system itself; of the system to comply to market-specific business processes; of any custom developments made on the system; and/or localization/translation of support documentation and training materials.

Most ERP applications and systems are provided in multiple languages by default. As I mentioned earlier, most ERP solution providers localize their products in a large number of languages. Most ERP vendors outsource their product localization.

Next, the word localization is also used to describe the process of customizing the product's functionality to the business practices of a particular market. Some ERP manufacturers use the term a localization to refer to a specific country version or extension of their product. ERP solutions are normally available in "country versions" and "language versions." Oracle, for example, sells a fully globalized product, stating, "Oracle E-Business Suite supports multiple languages, unique global business practices, and local statutory and regulatory requirements, such as the Euro — powerful capabilities that include localizations for 43 countries and support for 29 languages."

After tailoring a solution to suit local market and industry domain requirements, a company implementing an ERP system may still require customization of the product, simply to make the ERP product reflect a company's existing business processes. Prior to deployment, all critical business processes are analyzed and compared with the standard functionality offered by the ERP product. In case of major discrepancies, the decision is made to either change the process to comply with the ERP package or to customize the software to follow existing processes. During this fit/gap process, the decision can be made to provide customizations in the local language or to leave them in English. Using a local language version of an ERP package with customizations in English could, in fact, produce even a user interface consisting of fields or controls in different languages.

The final level of localization in global ERP implementations is normally called "translation" by organizations implementing ERP and refers to the translation of end-user reference documentation or training materials. Change management practices common to ERP deployments dictate extensive training for end users, and training materials are normally provided in the local language. To complicate matters even more, support documentation is often translated from a base language, for example, English, into a select number of target languages, such as French and Spanish. Then it's up to the individual market to create "localizations" of these translations — that is, to adapt the translated materials to local needs and practices. This approach obviously requires careful version control and update management.

Technical Considerations

In addition to the business and procedural challenges to deploying an ERP system globally, a global system also has many technical complexities. Some of the technical issues to consider when deploying an ERP package for multiple countries include globalization features — multiple time zones, currencies, calendars, address formats and data standards; support for various character sets — on-screen display, storage of data and output to various devices such as faxes or printers; efficient global system landscape — for example, one central global system or distributed systems in each market or region.

Considering the amount of data processed by a global ERP system, it is extremely important to create the right balance between centralized data and local deployment.

Globalization features. The first challenge has been addressed by most ERP solution providers for many years, simply to meet customer demands that were increasingly global. SAP especially has been a forerunner in implementing advanced globalization features in its products.

According to SAP's white paper Global Solutions Without Boundaries, "With business solutions based on mySAP.com, you can adapt your operations to each country individually. SAP has applied local experiences and knowledge to develop country versions that meet the unique business requirements of each country."

An Oracle developer recently stated, "In addition to this country-specific functionality, the standard Oracle Applications products have an underlying architecture which is capable of supporting global business practices, for example, with a flexible chart of accounts, tax reporting ledger, shared service capabilities and multiple organizations structures. Our features are all designed and built with a global perspective, and, increasingly, the country-specific requirements are being integrated into the standard products. Statutory compliance may be the hardest globalization feature for ERP solution providers to maintain, especially because some countries such as Brazil change their regulations frequently.

Character sets. The technical complexities of supporting multiple languages in global ERP solutions have always been numerous, but these are being greatly simplified by the adoption of Unicode by most providers. Character-set support is critical, for example, for the storage of multilingual data in a central repository. Even though most ERP solutions had workarounds, creating one central repository containing data in multiple character sets has always been cumbersome.

A growing number of solutions now support Unicode, which makes the character set issue less stringent, especially for new implementations that don't have to deal with too many data conversions and legacy systems.

According to the PeopleSoft Web site, "PeopleSoft is one of the first to embrace Unicode fully, which allows you to run PeopleSoft 8 applications simultaneously in more than 100 languages. This capability empowers you to store data from any language in a single database instance and build applications with global functionality within a single code module."

System landscape. Implementing one ERP central system to manage business processes and data worldwide obviously has many advantages such as central control, reporting and IT efficiencies, but it is often simply not feasible for size and performance reasons. Most organizations, therefore, adapt the ERP implementation to the current business environment — only centralizing, for example, those processes that are already global.

Global ERP systems can be deployed in several layers, for example, Global, Regional and Local. Each layer has separate systems, customizations, functionality and translation needs. Organizations can choose which business processes are placed in which layer.

Language Considerations

Every organization implementing software globally has to make some strategic decisions related to language and translation of systems or documentation.

The translation strategy should aim to find answers to the following questions:

Which languages do we need to cover and how do we prioritize?

What is the impact of language support on our system landscape?

What do we translate and when do we start?

Who will own the translation initiative and budget?

Who will be involved in the execution?

How can we minimize translation costs and still guarantee quality?

How do we organize and automate the translation process?