A Speech Technology Evolution        

Designing Speech Recognition Technology

March 20, 2019

Speech Recognition in Product Development Today

Speech recognition technology has come a long way. As previously explored in our post tracing its history, what started out as systems that could recognize audio fingerprints and simple arithmetic commands has evolved into sophisticated technology that can comprehend full speech patterns and sentences, and respond accordingly. Today, more developments are being pursued that could potentially expand the use of speech recognition software and devices in various business sectors.

Data Entry Automation

Developers are currently experimenting with an ambient listening system that is able to transcribe a conversation between a doctor and patient. The same software is also programmed to upload key details of the conversation into a digital medical record system.

Stat News shares that the product comes equipped with 16 microphones and a motion detection camera that mounts on a wall. Assuming the developments are a success, the technology would help produce instant documentation and eliminate computer interference in a doctor-patient session.

Voice Command Technology in Classrooms

Teachers deal with large volumes of data everyday. Not only do they have to gain access to lesson plans and materials, they also need to refer to student records from time to time. In this area, voice command technology may help ease the burden on teachers.

With this in mind, Amazon Web Services (AWS) has partnered with an edtech company to offer Alexa-enabled devices that can assist educators in gaining on-demand access to critical data and actionable insights. The devices will be equipped with administrative solutions that carry data on a large number of educational organizations, schools, students, and other educators. With this kind of technology at-hand, efficiency in the educational workplace may improve.

Voice-activated Assistants in Automobiles

Automakers are aiming to produce smarter cars for the future, and advanced systems for them will most likely have voice-activated assistants as one of the features. The Engineer website’s Stuart Nathan discussed how these systems can control car functions like air conditioning, navigation, and more with the guidance of just the driver’s voice.

Currently, users can rely on voice apps on their handheld devices for assistance, but they need a stable, consistent internet connection to work. With a system built into the car, drivers can take more control of their vehicles through voice prompts. This technology can then promote road safety, as there is no need to visually refer to a device while driving.

Speech Translation

Language barriers may soon be a thing of the past with the help of speech translation technology.  Nationwide once demonstrated the potential of the technology in a two-day seminar that involved 30 speakers. Audio from the special microphones used, was fed into translation technology in order to generate captions followed by subtitles in a dozen languages. There was also the option to use a text-to-speech feature so that audiences could listen rather than read the subtitles.

If perfected, speech translation will not only be useful in presentations where the audience are from different countries; it’ll also be extremely useful for the travel industry in general. Travelers can use handheld speech translation devices to communicate better with locals.

Design Considerations

Offline voice recognition: As indicated earlier for built-in car systems, voice recognition technology will be able to serve more people if network limitations are eliminated. These systems can then be built into a microchip and integrated into a device or a larger system like a vehicle.

Localization: English may be considered a universal language, but certain phrases and words may have different meanings depending on which country you’re in. For instance, in the UK, the word “chips” may refer to “fries” instead of the snack. It’s a factor that should be taken into consideration when designing speech recognition systems for specific regions or groups. A whitepaper on voice search by Ayima details how 22% of people using voice search look for local information. This proves that localization shouldn’t just mean language context – it should also refer to content.

–  Data Privacy: When designing voice user interfaces, transparency is a priority. Users have the right to choose whether or not voice-activated systems can upload their recorded data into the cloud and be shared with others. The technology and the companies selling them should also be clear with users on what their data will be used for.

Up Down