Recent research shows that more and more people use automatic speech recognition as an interpretation platform on a daily basis to perform voice-search, send text messages, or interact with voice assistants. ASR technology is also a tool to augment professional performance in interpreting, post-editing, translation, and subtitling. It underwent a revolution after the advent of deep neural networks.
Today we will discuss how ASR as an interpretation platform can increase the interpreters’ productivity, the main challenges decoding human speech the interpreters may face, and the results of speech technologies and post-editing investigations.
Increased Productivity for interpreters
- Computer-assisted translation (CAT) integration with ASR systems. CAT tools are based on the traditional input modes of a keyboard and a mouse. The report on Computational Terminology and Machine Translation of Vienna’s University emphasizes that commercially available CAT tools offer integration with ASR interpretation platforms, like memoQ combined with Apple’s speech recognition service or Matecat combined with Google Voice to increase productivity during the interpretation process.
- Benefits of using ASR. The interpreters are more productive when using ASR. It gives them the opportunity to interpret faster, search the web, or draft emails quicker. With ASR, their typing speed increases from 40 up to 150 words per minute, which allows the interpretation platform to be more flexible, and interpreter-centered. It is also an open possibility for blind and visually impaired persons to work in the translation industry.
Challenges of decoding human speech
Decoding human speech (DHS) can be a challenging activity due to several factors. The interpreters need to double-check their ASR-generated interpretations which also require a certain level of interpretation experience and understanding of using the interpretation platform to integrate into the practice successfully.
Here are 6 main factors that can affect the success of DHS:
- Homophones. Different words that sound the same and require more than just the sound alone to understand.
- Code-switching. Rapidly switching between dialects or languages is extremely common in normal human conversation around the world.
- Variability in the volume, speed, or quality of someone’s voice. The changes in the amplitude of the first harmonic correspond to increases in perceived vocal breathiness. The relationship to phonatory characteristics appears to be speaker-dependent.
- Ambient sounds. The sounds like echoes or road noise can be the reason for ASR misunderstanding and wrong interpretation direction.
- Transfer influences from one’s first language(s) to second languages. Influences are across all parts of language learning, from vocabulary and grammar to function and punctuation. Speaking the other language, the first language, will rule on the accent.
- Paralinguistic features. Pace, tone, and intonation can confuse the ASR interpretation platform.
Speech technologies and post-editing
The potential of using ASR for post-editing purposes is reasonably high: using speech instead of typing can speed up the work of the interpreter even in the context of post-editing.
Here are 3 main results of the investigation of the interpretation platform from different perspectives:
- The result of investigating The post-editing with the aid of a speech recognition system is faster, less tiresome, and more ergonomic.
- The result of investigating The interpreters working there was open to trying speech-based post-editing as a new translation workflow.
- The result of investigating The voice input is more productive than typing alone because it adds dimension to the post-editing task, allowing interpreters to alternate between different input modes depending on the task difficulty.
The automatic speech recognition system ASR revolutionized interpreters’ performance in interpreting, post-editing, translation, and subtitling. Using this technology and leveraging speech instead of typing, the interpreter’s speed increases from 40 up to 150 words per minute. This comes with its challenges since mistakes can be made due to paralinguistic features, ambient sounds, or similar sounding words. Mentioned factors call for a double-checking from experienced interpreters. However, this speech technology has become key to productivity and is for sure here to stay.
You may also like: Powering your business with a Cloud – Utilising cloud technologies
Image source: Shutterstock.com