Speech Recognition - Human Car Interaction

Speech Recognition (In-car voice recognition system)

What is Speech Recognition?

Speech recognition is a process of converting spoken words into text.
It is also known as Automatic Speech Recognition (ASR).
The technology gained acceptance and shape in early 1970s due the research funded by Advance Research Project Agency in U.S Department of Defense .
It has been widely in use since 1970s in various domains such as automotive industry, health care, military, IT support centers, telephone directory assistance, embedded applications, automatic voice translation into foreign languages etc.

It consists of:

At the heart of the software we have the translation part:

Breaks down the spoken words into phonemes
“Phonemes” are analyzed to see which units fits best, which can be derived from its dictionary
System has to be trained to recognize factors associated with the users voice e.g.. Speed, pitch

Classification of Speech Recognition Systems

Types of Speech Utterance
- Isolated Words
- Connected Words
- Continuous Speech
- Spontaneous Speech
Types of Speaker Model
- Speaker dependent models
- Speaker Independent models
Types of Vocabulary
- Small vocabulary –tens of words
- Medium vocabulary –hundreds of words
- Large vocabulary –thousands of words
- Very-large vocabulary –tens of thousands of words
- Out –of –Vocabulary –Mapping a word from the vocabulary into the unknown world

Factors affecting accuracy of speech recognition systems

Acoustic Modeling

Establishing statistical representations for the feature vector sequences computed from the speech waveform.
Pronunciation modeling. How fundamental speech units represent larger words.
Feedback information - reshapes the speech vectors and gives noise robustness.
Taking audio recordings of speech and their transcriptions and then compiling them into statistical representations of the sounds for words.
E.g.: HMM, Neural Networks.

Language Modeling

Probabilities of sequences of words.
Provides context to distinguish between words and phrases that sound similar. e.g “recognize speech” and “Wreak a nice beach”
Offers grammatical support for the words processed E.g.: “The milk is the cat”

Boom and the Technology

Hidden Markov Models, n-gram models and neural networks are widely used in speech recognition systems.
Since 2010, deep neural networks are being used for learning algorithms to train the system
A survey by Statista shows the twenty percent growth in cars with speech recognition system installed in 2012 and 2019
The share of voice recognition system equipped cars is predicted to be increased from 37 percent in 2012 to 55 percent in 2019 .Read more on http://www.statista.com/statistics/261685/voice-recognition-in-new-cars /

Deep delve into the technology

Neural Network

Hidden Markov Model

Language Models

N-gram model -is a contiguous sequence of n items from a given sequence of text or speech.
Unigram model –an n-gram of size 1 is referred to as “unigram”.

In-car speech recognition

“Speech Interactive mode” is interaction with the car.
Automotive industry fast paced to come up with efficient systems to perform the below in all upcoming cars:
- Hands-free use of mobile phone handsets in the car e.g. “Dial office”
- Speech instructions to navigation systems e.g. GPS-connected digital maps: “How far is it from highway?”
- In-car system interaction e.g. “Turn on the radio to the travel reports channel.”
- In-car steering systems – leading to the concept of “Autonomous” cars .

Contemporary Speech recognition systems (in car)

Speech recognition (a solution and a cause of Driver Distraction)

Most of the people hold a perception that speech recognition technology is safer because the driver does not need to take eyes off the road or hands off the steering wheel.
Speech recognition is assumed to be a panacea for driver distraction offering an alternative to the visual/manual demand of the system thus;
Use of handsets while driving is illegal in 14 states
Use of “hands-free” voice controls is being encouraged

In contrast

Speech-based systems can demand attention resulting in increased cognitive load and distracting driver just as visual displays and manual controls do(A comparison of Ten 2015 In-Vehicle Information Systems). Read more on https ://www.aaafoundation.org/sites/default/files/strayerIII_FINALREPORT.pdf
The study by AAA foundation in June 2013 showed that the mental workload from performing complicated tasks slows reaction times, whatever the driver is doing with his or her hands. Read more on https://www.aaafoundation.org/sites/default/files/MeasuringCognitiveDistractions.pdf

Research by AAA

More research and findings

Speech interface requires cognitive demand, which can interfere with the primary driving task .Read more on http://umich.edu/~driving/publications/LoIJVT2013.pdf
Research analyzed the difference in reaction time, when no email system is provided in car with the reaction time when provided with speech based in-vehicle email system. The results showed 30% increase in reaction times when used speech-based system. Read more on http://journalistsresource.org/studies/environment/transportation/distracted-driving-voice-activated-systems-reaction-times and http ://user.engineering.uiowa.edu/~csl/publications/pdf/leecavenhaakebrown00.pdf
In cars, the environment has lot of disturbances, which makes speech recognition system inefficient. Read more on http ://www.bcs.org/upload/pdf/ewic_hci09_paper59.pdf

Latest research says

In last 2014, A J.D. Power and Associates executive announced that automotive voice-recognition technology should receive a “failing grade”. Read more on http://www.pcmag.com/article2/0,2817,2462627,00.asp
Unlike voice recognition on portable devices, the technology has to contend with lots of road and engine noise inside a moving car hence not upto the mark. Read more on http://erskine-mcmahon.com/files/cell.phone.driving.distraction.article.10.pdf
Cognitive work-load increases leads to distracted brain. Read more on https://www.aaafoundation.org/sites/default/files/strayerIII_FACTSHEET.pdf

Speech to text-most cognitive distracting(AAA foundation for traffic safety-2011)

Future analysis and take on Speech recognition in cars