This is just a short post to share some exciting news. After around 18 months of writing (with some breaks), so many trips to the National Library of Scotland that I have a favourite seat in the Reading Room, and a few trips to try out some ideas in public, preorders for my new book: Interpreters vs Machines: Can Interpreters Survive in an AI-dominated World? are now open. (Click the title to go to the preorder page.)
This book deals with the biggest question in interpreting right now: do human interpreters stand a chance of professional survival, faced with the gathered might of the world’s biggest tech companies? It also deals with the second biggest question: which strategy offers our best hope?
After reading the research, taking an honest look at our profession and really thinking through what is actually going on right now, I have found some uncomfortable answers.
The answers are uncomfortable for everyone.
The answers are uncomfortable for interpreters as they force us to face up to some of the most difficult issues in our profession and practice. Failure to take seriously the challenges of machine interpreting leads inexorably to being replaced.
The answers are uncomfortable for makers of machine interpreting devices and apps as they force them to face up to the weaknesses in their understanding of interpreting. Failure to take seriously the need to actually understand what interpreters do will consign their best work to the world of geeky gadgets that never live up to their promise.
The answers are uncomfortable for the general public, even if they never think of buying a machine translation device, as they force us to face up to what it really means to live in a society where information is currency and where the technology we use to communicate might just control what we can and can’t say and how we say it.
This isn’t just a book about interpreting; it’s a book where I deliberately attempted to get to the truth of what it really means when people in the tech sector say they will replace people with machines. If you are an interpreter, a programmer with an interest in AI, deep learning or machine interpreting, or even just someone interested in the power and effects of technology, this book is for you.
As soon as the final release date is confirmed, I will bring you another update.
It’s arguably the most exciting technology to arrive since the invention of the internet itself. The ability to converse effortlessly with anyone in any language is finally here, thanks to tiny in-ear devices or free apps on your phone. Or so the technologists say. The results might have been mixed at best so far but Moore’s Law and lots of hand-waving tell us that we are close to the finish line of replacing humans, right?
The Problem as Most Technologists see it
Up until very recently, the problem statement of speech translation was simple. Take in spoken language, turn it into written language, use machine translation to flip that into another language, use voice synthesis to speak out the result.
The key in that process was to hit 100% accuracy at each stage and suffer no loss at any single point. Hence the plethora of press releases proudly parroting figures like “97% accuracy” (the exact phrasing used by Tencent about their system before it fell down spectacularly in front of an audience).
A Major Flaw Appears
Apart from the gigantic holes in their reasoning that are obvious to anyone who has ever performed or studied interpreting (and which will be discussed at length in my new book), there is one major flaw in their problem definition. No-one, not even the greatest expert in interpreting, not even the best machine translation researcher, has a solid, empirically-reliable and practically realistic definition of accuracy. Any attempt to do so quickly runs up against either real life or logical potholes the size of a small continent, as the video below illustrates:
Why this Leads to “The Wall” (at least for now)
To discuss all the difficulties caused by this problem would take a book, not a blog post. The main point to understand is that this problem with “accuracy” is a symptom of the wider problem that the makers of speech translation symptoms do not understand how communication works between people, never mind how interpreting works. This point was underlined by Prof Andy Way at the recent ITI conference, when he pointed out that recently, there has been a trend for newcomers to attempt to solve machine translation, without ever having learned a language or studying linguistics. This inevitably leads to embarassing shocks.
In speech translation, the shocks are even worse. Without a basic knowledge of culture-specific pronoun and register use, the relationship of language to social context and how for example, the functions of intonation in English are mirrored by sentence structure in French, any attempts at speech translation will never get past the stage of helping people find the toilet.
Haven’t Google Solved all that?
Google might just have found a way through some of the mess, with its much-vaunted “Translatotron”, which claims to work directly from speech to speech, even to the point of keeping speech patterns in the interpreted version. If they are actually telling the complete truth, that would indeed by a real breakthrough but that breakthrough also hides an uncomfortable fact.
Speech patterns don’t work the same in different languages. Where English uses intonation for emphasis, clarification and expressing attitude, other languages use word order, speed, noun declensions or even code switching to do the same things. That means that the goal of making you sound the same in Spanish as you do in English is itself a pretty pointless goal.
The Coming Wall
This hints at a coming moment, which is likely to arrive sooner rather than later, when investment in speech translation begins to generate decreasing returns. While the current ways of doing speech translation are sufficient to make passable devices for tourists, the costs of doing so are still high. The ability to take this technology and turn it into either a replacement for human interpreters or even a consistently useful help for them seems out of reach. Why?
Quite simply, the current capability of speech translation isn’t limited by processor power or memory or programming but simply by that superficial understanding of language and communication I mentioned earlier. Pouring more money into speech translation might be a very good way to eventually make the devices cheaper or improve resistance to background noise but it won’t solve the underlying problems. In short, the weakest link in speech translation is the thinking of the engineers making the software.
Could this change? There is no reason why not. Anyone smart enough to build a system that can connect speech recognition, machine translation and voice synthesis is smart enough to pick up any book on interpreting or any book on spoken language and rewrite their algorithms accordingly.
That might be enough, assuming that there can be an algorithm that can fully understand not just words but meaning and intention. It might be enough if it is possible to make an algorithm that processes language as flexibly and quickly as the human brain and can detect new words and phrases and work out their meaning from context.
In the Meantime
As I haven’t seen any sign of speech translation makers moving away from the current faulty understanding of language and communication, I wouldn’t presume to predict the future of their work. I would suggest, however, that if current trends continue, the gains in quality in speech translation will soon slow to a crawl.
For businesses, this means relying on humans for all important communication and leaving finding your way to the nearest metro stop to the speech translation devices. For interpreters, this means keeping an eye on our “robot overlords” and keeping one step ahead. And if you want to know exactly how to do that, keep your eye out for a new book.