|
Introduction To Voice & Speech Recognition
There are billions of human beings
around the world speaking different languages and yet we are able to
recognize someone by listening to someone’s conversation or speech
as long as we can understand the language. We can also usually
recall someone’s voice even we have not seen that person for years.
In the movies, we have all seen how robots and ultimately computers
can understand human voice commands and even speeches and in some
cases they can even speak our language and have an intelligent and
interactive conversation with us.
Well that is in the movie, in
reality we are far from achieving speech recognition in computing
world, not so much for lack of efforts but lack of enough
technological break-through we need to really realize the dream. A
dream that will one day change the world and make computing
applications to be even more widely applicable to the rest of the
mankind and in all kinds of applications that can only limited by
our imaginations.
So what are some of the basics
regarding voice and speech recognitions?
The entire subjects of voice &
speech recognition are often confused and the terms are often
misused. Knowing what’s being spoken is very different than knowing
who is speaking. Voice recognition focuses on who is speaking it (a
synonym for the speaker – a way we can recognize one speaker from
another) and the speech recognition concentrates on what is being
spoken.
Base on above then there are the
difference between speaker verifications and identifications.
In 2020, John goes to an ATM
machine, speaks to it to be verified as the owner of a checking
account so that he can take the money out of the account and
transfer into his “wallet” via EHF (Extremely High Frequency)
signal. A “wallet” that looks like a handheld device of today also
acting as his multi-function voice-command communicator as well as
his remote control for all things electronic including the garage
door, as well as the key to start his
zero-emission flying transportation vehicle just came out last
year. Speaker verification is typically used in access control
applications. In this case, there is a known voice print to be
compared with.
Meantime, in Interpol’s data center
in New York, Mary is entering a voice file just intercepted and
decrypted from Internet into their super computer in hope to
identify it against one of the millions criminals’ voices stored in
the system. Speaker identification is typically used for matching
or checking the speaker against something that may or may not be
known to the system that is matching or checking it. Contrast to
speaker verifications, speaker identification requires a complex
voice processing algorithms to produce matching.
Today, we have good success with
voice & speech recognitions in controlled environment. However, the
application becomes very limited for a technology that can only work
in controlled environment.
At issues are the environmental
noises, multiple speakers speaking at the same time and training the
system to understand us. At high level, we are really talking about
pattern recognition in a very fine detailed way that works in any
environment at a very high speed.
Today, we have not yet seen either
voice or speech recognition used in any sort of wider way because we
still do not have solved some of the issues stated above. Even
speaking to our voice command cell phone could be a trying
experience at times. I believe the first issue to solve is how we
can have quality input data that is free of the environmental
interference, I challenge to say today’s input device base on
microphone technology is not going to work well other than in
controlled environment.
I
dream of a day where typing, reading and writing skills are optional
to use a computer. No that those are unnecessary skills, they are
very important skills, just that they should not be the hurdles of
using a computer for all mankind. Keep in mind not everyone speaks
or writes in English. Try to type a short email from your handheld
in Chinese today; it is doable but how long. More importantly
computer and handheld devices do not need to have a keyboard to be
useful and can really be everywhere and speaks all languages (just
patterns for computer).
By
Benson Yeung,
Senior
Partner

Benson Yeung Biography

Back to Top
Information Request Form
|