Translation, by the numbers
Source: www.cbc.ca
Author: Stephen Strauss
My son was recently vacationing in Quebec's Gaspé when one of his companions started to munch on one of those high-energy power bars that pick you up after a strenuous day's skiing.
Suddenly, she started to chortle. She pointed to text on the bar's wrapper and soon everyone was laughing. The cause: one of those translation absurdities that dot the unevenly bilingual landscape of Canadian packaging.
The Vancouver-based company had written in English that the bar was gluten free and dairy free. In French, this was turned into gluten libéré and laitier libre. Ha, ha, ha. While libéré does literally refer to free, it means free in the sense of liberated, often from something oppressive. Thus, the gluten reference would literally translate as "freed from gluten," as if gluten could be likened to a dictator.
More subtly, laitier doesn't mean dairy when it stands alone. Rather, you would have to say produits laitiers, as laitier would be like saying "milkness." At least the label didn't say gluten gratuit, one of my son's friends said. More guffaws. Gratuit also means free, but in the sense that it doesn't cost anything. However, there is an easy solution to the Vancouver company's problem: Let the internet be the translator – at least in part.
That doesn't mean go to Babelfish, the automatic language translation service on the Net. Type in libéré there and it tells you it doesn't know what you're talking about. Rather, simply assume that if a term or phrase has been used lots of times on the net by native speakers or good translators, it is correct. This blindly statistical approach is the essence of a kind of mini-revolution in the field of computer translation.
Instead of trying, as previous generations of computer scientists did, to teach a translating computer to think as a linguist, let the machine sift through existing translations of a phrase and statistically determine the likeliest usage.
Here is how you – and that poor misspeaking company in Vancouver – can use the principle of statistical machine translation today.
First, go to the Google search engine and type in "Government of Canada." You figure their translators probably have gotten things right. Next, type in "gluten free." The first reference is to an Innovation Canada website. It translates gluten free as sans gluten. This seems correct, but the internet is notorious for its accuracy potholes and gluten libéré also sounds right to the untrained ear. Go back to Google and type both words within quotation marks. There are 118,000 references to sans gluten. Type in "gluten libéré" and one reference appears – in Spanish. Sheer numbers scream the correct choice.
Type in "dairy free" in the Government of Canada website and sans produits laitiers pops up first. Put that into Google and it is referenced 805 times. Type "laitier libre" and it pops up seven times, one of which is a blog laughing at what a ridiculous translation it is for gluten free it.
Type in "gluten gratuit" and there are two references.
Of course, this approach isn't going to replace translators, who are needed to catch gross miscues, such as the paragraph found on the website of a Quebec power-bar manufacturer that begins in English: "Intervening in more than 50 pays GARANTIE BIO ECOCERT is a certification society approved by public authorities that has been active daily for more than 15 years on the field."
But my demonstration suggests that, in about half an hour, a non-French-speaking employee at that Vancouver firm could have verified something as simple as that ingredient label and likely got the translation right – or at least have improved on the current mishmash.
There are other applications. I'm writing a letter in German to a longtime friend of my father's announcing his recent death. My German is at the third-grade level and I am not sure whether I should use the German word for "has," as in "father has died," or the word for "is," as in "my father is dead." I go to Google. Ist (is) beats hat (has) 1060 to 0. It's little stuff but useful stuff and a sign of the times on the internet, where simply counting what people do is much more significant than anyone ever thought. It's how Google makes its money, and it's how government agencies might be able to separate liars, spinners, plotters and terrorists from the guileless rest of us.

<< Home