Integrity Languages


Tag Archives: speech translation

Reassessing Speech Translation

By: Jonathan Downie    Date: February 24, 2020

Back in October, I wrote a blog post in which I admitted that I had jumped the gun in my assessment of remote interpreting. Now, after writing a book on speech translation and pointing out the flaws in a recent BBC article on the subject, it is time for me to go through the same process for speech translation. And the results are … slightly more complex.

Two Worlds

In my book, Interpreters vs Machines, I deliberately concentrated on the basic operating principles of all speech translation solutions and on the research that was available at the time. After that, I deliberately chose to focus on the claims of commercial speech translation solutions, since they were getting the most attention by the media and by professional interpreters. It turns out that my decision was actually pretty sound.

Academic research on speech translation is continuing quietly and is making steps to deal with one or two of the major issues I discussed in the book, especially the losses that happen when you turn speech into text, run it through machine translation and then turn the results into speech again. Apart from the Google Translatotron, which is yet to be subject to proper public assessment, there is an upcoming conference that seems to be pushing the idea of going directly from speech to speech.

The commercial world, however, continues to produce some quite remarkable claims. Take this one from a recent video by Waverly Labs:

Doing our research, we studied the tools used by professional interpreters, by taking inspiration and going one step further, we gave ambassador the capability to deliver natural, professional grade translation.

While we could quibble about the meanings of the terms here, the phrase “natural, professional grade translation” is a pretty bold claim. Either it means that they are claiming to have matched the quality of professional interpreters, which their own CEO admitted they haven’t or they are claiming their system is good enough to be used by professionals. In the latter case, one might ask whether those professionals should be persuaded to switch from using professional human interpreters.

In either case, it is clear that the purveyors of commercial speech translation are making incredibly bold claims without citing any empirical evidence. How long it will be before we have a repeat of the embarrassing Tencent incident or even the Microsoft “human parity” blunder is anyone’s guess.

Language Access

But, despite questionable marketing practices, speech translation does have a place. Yet again, it helps to turn to the always thought-provoking, Sarah Hickey, of Nimdzi Insights and now Troublesome Terps. At the #Conf1nt100 conference in Geneva, she pointed out that speech translation is finding niches in places where professionals wouldn’t be used anyway.

If you run a library, speech translation can help you achieve basic communication with patrons from other countries; in emergency situations, it can allow for simple triage until a human can be found, and of course, speech translation is great for tourists and frequent business travellers.

In short, speech translation is providing language access and doing it in places where that access might not have been previously available. That can only be good news.

What does all this mean for human interpreters? You’ll have to buy my book to discover that but I will say that we need to look beyond the unfortunately flawed coverage and crazy claims of speech translation to spend more time thinking through what researchers are managing to achieve. Basically, for now at least, ignore the marketing waffle and trust the engineers.

Want to know more?

As ever, if you are looking for advice as to how to get the best out of interpreting in your business, looking to build an interpreting team for your next event, or if you are looking for a conference interpreter in the UK working between French and English, drop me an email.

Can interpreters beat the bots?

By: Jonathan Downie    Date: August 26, 2019

This is just a short post to share some exciting news. After around 18 months of writing (with some breaks), so many trips to the National Library of Scotland that I have a favourite seat in the Reading Room, and a few trips to try out some ideas in public, preorders for my new book: Interpreters vs Machines: Can Interpreters Survive in an AI-dominated World? are now open. (Click the title to go to the preorder page.)

This book deals with the biggest question in interpreting right now: do human interpreters stand a chance of professional survival, faced with the gathered might of the world’s biggest tech companies? It also deals with the second biggest question: which strategy offers our best hope?

After reading the research, taking an honest look at our profession and really thinking through what is actually going on right now, I have found some uncomfortable answers.

The answers are uncomfortable for everyone.

The answers are uncomfortable for interpreters as they force us to face up to some of the most difficult issues in our profession and practice. Failure to take seriously the challenges of machine interpreting leads inexorably to being replaced.

The answers are uncomfortable for makers of machine interpreting devices and apps as they force them to face up to the weaknesses in their understanding of interpreting. Failure to take seriously the need to actually understand what interpreters do will consign their best work to the world of geeky gadgets that never live up to their promise.

The answers are uncomfortable for the general public, even if they never think of buying a machine translation device, as they force us to face up to what it really means to live in a society where information is currency and where the technology we use to communicate might just control what we can and can’t say and how we say it.

This isn’t just a book about interpreting; it’s a book where I deliberately attempted to get to the truth of what it really means when people in the tech sector say they will replace people with machines. If you are an interpreter, a programmer with an interest in AI, deep learning or machine interpreting, or even just someone interested in the power and effects of technology, this book is for you.

As soon as the final release date is confirmed, I will bring you another update.

Why Speech Translation is About to Hit “The Wall”

By: Jonathan Downie    Date: July 9, 2019

It’s arguably the most exciting technology to arrive since the invention of the internet itself. The ability to converse effortlessly with anyone in any language is finally here, thanks to tiny in-ear devices or free apps on your phone. Or so the technologists say. The results might have been mixed at best so far but Moore’s Law and lots of hand-waving tell us that we are close to the finish line of replacing humans, right?

Maybe not.

The Problem as Most Technologists see it

Up until very recently, the problem statement of speech translation was simple. Take in spoken language, turn it into written language, use machine translation to flip that into another language, use voice synthesis to speak out the result.

The key in that process was to hit 100% accuracy at each stage and suffer no loss at any single point. Hence the plethora of press releases proudly parroting figures like “97% accuracy” (the exact phrasing used by Tencent about their system before it fell down spectacularly in front of an audience).

A Major Flaw Appears

Apart from the gigantic holes in their reasoning that are obvious to anyone who has ever performed or studied interpreting (and which will be discussed at length in my new book), there is one major flaw in their problem definition. No-one, not even the greatest expert in interpreting, not even the best machine translation researcher, has a solid, empirically-reliable and practically realistic definition of accuracy. Any attempt to do so quickly runs up against either real life or logical potholes the size of a small continent, as the video below illustrates:

Why “accuracy” is hard to define in interpreting.

Why this Leads to “The Wall” (at least for now)

To discuss all the difficulties caused by this problem would take a book, not a blog post. The main point to understand is that this problem with “accuracy” is a symptom of the wider problem that the makers of speech translation symptoms do not understand how communication works between people, never mind how interpreting works. This point was underlined by Prof Andy Way at the recent ITI conference, when he pointed out that recently, there has been a trend for newcomers to attempt to solve machine translation, without ever having learned a language or studying linguistics. This inevitably leads to embarassing shocks.

In speech translation, the shocks are even worse. Without a basic knowledge of culture-specific pronoun and register use, the relationship of language to social context and how for example, the functions of intonation in English are mirrored by sentence structure in French, any attempts at speech translation will never get past the stage of helping people find the toilet.

Haven’t Google Solved all that?

Google might just have found a way through some of the mess, with its much-vaunted “Translatotron”, which claims to work directly from speech to speech, even to the point of keeping speech patterns in the interpreted version. If they are actually telling the complete truth, that would indeed by a real breakthrough but that breakthrough also hides an uncomfortable fact.

Speech patterns don’t work the same in different languages. Where English uses intonation for emphasis, clarification and expressing attitude, other languages use word order, speed, noun declensions or even code switching to do the same things. That means that the goal of making you sound the same in Spanish as you do in English is itself a pretty pointless goal.

The Coming Wall

This hints at a coming moment, which is likely to arrive sooner rather than later, when investment in speech translation begins to generate decreasing returns. While the current ways of doing speech translation are sufficient to make passable devices for tourists, the costs of doing so are still high. The ability to take this technology and turn it into either a replacement for human interpreters or even a consistently useful help for them seems out of reach. Why?

Quite simply, the current capability of speech translation isn’t limited by processor power or memory or programming but simply by that superficial understanding of language and communication I mentioned earlier. Pouring more money into speech translation might be a very good way to eventually make the devices cheaper or improve resistance to background noise but it won’t solve the underlying problems. In short, the weakest link in speech translation is the thinking of the engineers making the software.

Could this change? There is no reason why not. Anyone smart enough to build a system that can connect speech recognition, machine translation and voice synthesis is smart enough to pick up any book on interpreting or any book on spoken language and rewrite their algorithms accordingly.

That might be enough, assuming that there can be an algorithm that can fully understand not just words but meaning and intention. It might be enough if it is possible to make an algorithm that processes language as flexibly and quickly as the human brain and can detect new words and phrases and work out their meaning from context.

In the Meantime

As I haven’t seen any sign of speech translation makers moving away from the current faulty understanding of language and communication, I wouldn’t presume to predict the future of their work. I would suggest, however, that if current trends continue, the gains in quality in speech translation will soon slow to a crawl.

For businesses, this means relying on humans for all important communication and leaving finding your way to the nearest metro stop to the speech translation devices. For interpreters, this means keeping an eye on our “robot overlords” and keeping one step ahead. And if you want to know exactly how to do that, keep your eye out for a new book.