Five years ago, I wrote a cheeky post for the LifeinLINCS blog detailing some off-the-wall inventions that interpreters would love. Now, however, with the advent of machine interpreting software that can manage a basic conversation, I think it’s time to properly ask those who know their vector spaces from their z-tests if they could make us a single invention, which would ease the work of lots of conference interpreters.
The idea is very simple: one tricky problem in conference interpreting is that, even with the best research in the world, speakers seem able to find terms that the interpreters have never come across before. Worse, in the heat of a detailed speech, it is entirely possible that an interpreter could falter in finding that important term that that really did memorise but have suddenly forgotten.
Since Automated Speech Recognition (ASR) is now at the point where I no longer have to pass the phone to my wife when a phone system asks me to describe my issue and our smartphones are fast enough to do basic speech translation, surely we have the tech to fix that issue once and for all.
Here’s how I imagine it working.
While the speaker is talking, an ASR system scans the words and word clusters they use. Any words that are in the top 1000 most used words in the source language can be ignored but, if a word is rare, the system should automatically search the interpreter’s term bank for it. In fact, in an ideal world. the interpreter should be able to tell the system which domains they are working in (say, engineering, finance, HR) so it would prioritise rare words from that domain.
Since interpreters don’t want to be distracted, the system should then simply project the original term and its term bank version onto a rugged, travel-proof Heads-Up Display that the interpreter has placed in front of them.
But what if the term is rare and the interpreter hasn’t stored it already? In those cases, the system should be able to run parallel searches for it in the interpreter’s favourite term bases (think IATE, online and offline technical dictionaries, etc) and project the one or two most likely candidate translations.
There are a few technical headaches with this. In terms of language processing, teaching the system when to look up a single word term and when to treat a cluster as a term would be tricky. There is, after all a big difference between “shot” and “shot-blasting” and between “road” and “middle of the road”.
I’m no expert but it does seem that some kind of neural model and the ability to use the same system for semi-automated term mining beforehand might help the system “learn” what units count as a term in each domain. Possibly.
The second challenge is getting the user interface right. While experts in interpreter cognition, like Prof Kilian Seeber, have argued that interpreters simultaneously process information from multiple sources, there does still seem to be a point at which interpreters get overloaded. Add to this Prof Daniel Gile’s argument that there can be a gap between interpreters hitting an issue and it actually affecting their performance and you leave interface designers with the tricky task of ensuring that the system provides help when its needed but doesn’t distract interpreters when they’re doing fine.
There has been some research to try to fix that but it is still a challenge and it might take more research on problem triggers and performance drops to fix it for good. For the moment, using Heads-Up Displays, which allow the interpreters to still see through them, instead of asking them to look down from the speaker, would at least reduce the issue.
So, is anyone up for building one of those systems or testing one?