Integrity Languages

Blog

Tag Archives: machine translation

Why Speech Translation is About to Hit “The Wall”

By: Jonathan Downie    Date: July 9, 2019

It’s arguably the most exciting technology to arrive since the invention of the internet itself. The ability to converse effortlessly with anyone in any language is finally here, thanks to tiny in-ear devices or free apps on your phone. Or so the technologists say. The results might have been mixed at best so far but Moore’s Law and lots of hand-waving tell us that we are close to the finish line of replacing humans, right?

Maybe not.

The Problem as Most Technologists see it

Up until very recently, the problem statement of speech translation was simple. Take in spoken language, turn it into written language, use machine translation to flip that into another language, use voice synthesis to speak out the result.

The key in that process was to hit 100% accuracy at each stage and suffer no loss at any single point. Hence the plethora of press releases proudly parroting figures like “97% accuracy” (the exact phrasing used by Tencent about their system before it fell down spectacularly in front of an audience).

A Major Flaw Appears

Apart from the gigantic holes in their reasoning that are obvious to anyone who has ever performed or studied interpreting (and which will be discussed at length in my new book), there is one major flaw in their problem definition. No-one, not even the greatest expert in interpreting, not even the best machine translation researcher, has a solid, empirically-reliable and practically realistic definition of accuracy. Any attempt to do so quickly runs up against either real life or logical potholes the size of a small continent, as the video below illustrates:

Why “accuracy” is hard to define in interpreting.

Why this Leads to “The Wall” (at least for now)

To discuss all the difficulties caused by this problem would take a book, not a blog post. The main point to understand is that this problem with “accuracy” is a symptom of the wider problem that the makers of speech translation symptoms do not understand how communication works between people, never mind how interpreting works. This point was underlined by Prof Andy Way at the recent ITI conference, when he pointed out that recently, there has been a trend for newcomers to attempt to solve machine translation, without ever having learned a language or studying linguistics. This inevitably leads to embarassing shocks.

In speech translation, the shocks are even worse. Without a basic knowledge of culture-specific pronoun and register use, the relationship of language to social context and how for example, the functions of intonation in English are mirrored by sentence structure in French, any attempts at speech translation will never get past the stage of helping people find the toilet.

Haven’t Google Solved all that?

Google might just have found a way through some of the mess, with its much-vaunted “Translatotron”, which claims to work directly from speech to speech, even to the point of keeping speech patterns in the interpreted version. If they are actually telling the complete truth, that would indeed by a real breakthrough but that breakthrough also hides an uncomfortable fact.

Speech patterns don’t work the same in different languages. Where English uses intonation for emphasis, clarification and expressing attitude, other languages use word order, speed, noun declensions or even code switching to do the same things. That means that the goal of making you sound the same in Spanish as you do in English is itself a pretty pointless goal.

The Coming Wall

This hints at a coming moment, which is likely to arrive sooner rather than later, when investment in speech translation begins to generate decreasing returns. While the current ways of doing speech translation are sufficient to make passable devices for tourists, the costs of doing so are still high. The ability to take this technology and turn it into either a replacement for human interpreters or even a consistently useful help for them seems out of reach. Why?

Quite simply, the current capability of speech translation isn’t limited by processor power or memory or programming but simply by that superficial understanding of language and communication I mentioned earlier. Pouring more money into speech translation might be a very good way to eventually make the devices cheaper or improve resistance to background noise but it won’t solve the underlying problems. In short, the weakest link in speech translation is the thinking of the engineers making the software.

Could this change? There is no reason why not. Anyone smart enough to build a system that can connect speech recognition, machine translation and voice synthesis is smart enough to pick up any book on interpreting or any book on spoken language and rewrite their algorithms accordingly.

That might be enough, assuming that there can be an algorithm that can fully understand not just words but meaning and intention. It might be enough if it is possible to make an algorithm that processes language as flexibly and quickly as the human brain and can detect new words and phrases and work out their meaning from context.

In the Meantime

As I haven’t seen any sign of speech translation makers moving away from the current faulty understanding of language and communication, I wouldn’t presume to predict the future of their work. I would suggest, however, that if current trends continue, the gains in quality in speech translation will soon slow to a crawl.

For businesses, this means relying on humans for all important communication and leaving finding your way to the nearest metro stop to the speech translation devices. For interpreters, this means keeping an eye on our “robot overlords” and keeping one step ahead. And if you want to know exactly how to do that, keep your eye out for a new book.

Skills to Learn before you Learn to Code

By: Jonathan Downie    Date: May 7, 2018

With Event Technology, Neural Machine Translation and Remote Simultaneous Interpreting are all vying for publicity, we would be forgiven or thinking that the only choice is between jumping on the high-tech bandwagon and living in a shack on the plains. Many authorities have pleaded for all children to learn to code. The logic is simple, you are either learning how to handle data or you are just part of the data. But might there be a flaw in that logic?

Why Tech Fails

As much as the innovators would never admit it to their angel investors, history is littered with tech that went nowhere. To the well-known flops of laserdiscs and personal jetpacks can be added the expensive failures of nuclear-powered trains and boats and the hundreds of “instant translation” websites that promised to leverage the “power of bilinguals”. Just because a tech exists, that doesn’t mean it will actually make a meaningful difference, just ask the inventor of the gyrocopter.

While the stories of some technologies are unpredictable, there are others where it was clear that there as too wide a gap between what the engineers could do and what the market actually would accept. Take nuclear powered ships. While nuclear submarines are an important part of many navies, the reticence of many ports to let a ship carrying several kilos of activated uranium dock (never mind refuel or take on supplies) spelled the end of that particular dream.

Other times, technology has flopped due to a simple failure to understand the dimensions of the problem. Take those “instant translation” or “interpreting on the go” websites. Almost always the brainchildren of monolinguals who have a severe case of phrasebook-aversion, they all crash and burn when the founders realise that “bilingual” is a very loose concept and those with actual interpreting expertise are highly unlikely to want to spend their time saying the Hungarian for “where is the toilet?” or the Spanish for “I have a headache and can’t take ibuprofen” forty times a day.

The Problem with Machine Translation

To this motley crew, it seems that we have to add more than a few denizens of modern machine translation. With some leading experts busy telling us that translation is just another “sequence to sequence problem” and large software houses claiming that managing to outdo untrained bilinguals is the same as reaching “human parity” (read that article for the truth behind Microsoft’s claim), it is becoming plain that the actual nature of translation is eluding them.

The most common measure of machine translation performance, the BLEU score, simply measures the extent to which a given translation looks like a reference text. The fact that these evaluations and those performance by humans on machine translation texts are always done without any reference to any real-life context should make professional translators breathe more easily.

Only someone who slept through translation theory class and has never actually had a paid translation project would be happy with seeing translation as just a sequence to sequence problem. On the most basic, oversimplified level, we could say that translators take a a text in one language and turn it into a text in another. But that misses the point that every translation is produced for an audience, to serve a purpose, under a set of constraints.

The ultimate measure of translation quality is not its resemblance to any other text but the extent to which it achieved its purpose. If we really want to know how good machines are at translation, let’s see how they do at producing texts that sell goods, allow correct medical treatment, persuade readers, inform users, and rouse emotion without any human going over their texts afterwards to sort out their mistakes.

Skills to Learn before you Learn to Code

All this shows is that there are key skills that you need to learn before you are set loose on coding apps and building social media websites. Before kids code, let them learn to listen so they can hear what the actual problem is. Before they form algorithms, let them learn how to analyse arguments. Before they can call standard libraries, let them learn to think critically. Let them learn and understand why people skills have to underpin their C skills and why asking questions is more important that creating a system that spoon-feeds you the answer.

I hope that, for our current generation of tech innovators, it isn’t too late. We absolutely need technology to improve but we also need there to be more ways for tech innovators to listen to what everyone else is saying. We could do with some disruption in how events are organised and run but the people doing it need to understand the reasons behind what we do now. Interpreting could do with a tech revolution but the tech people have to let interpreters, interpreting users and interpreting buyers sit in the driving seat.

If our time isn’t to be wasted with more equivalents of nuclear-powered trains, if we are to avoid Cambridge Analytica redux, we need monster coders and incredible listeners, innovators who are also thinkers, writers and macro ninjas. It may well be that one person cannot be both a tech genius and a social scientist but we need a world in which both are valued and both value each other.

It’s a world we can only build together.