Triware Networld Systems 

19 Years Of Around The Clock Superior Network Systems Service & Support!

 

Home
Solution
Technology
Service
Support
Client
Partner
Career
Events
News
   Back ] Up ] Next ]
 
   

 

 

Introduction To Voice & Speech Recognition

There are billions of human beings around the world speaking different languages and yet we are able to recognize someone by listening to someone’s conversation or speech as long as we can understand the language.  We can also usually recall someone’s voice even we have not seen that person for years.  In the movies, we have all seen how robots and ultimately computers can understand human voice commands and even speeches and in some cases they can even speak our language and have an intelligent and interactive conversation with us.

Well that is in the movie, in reality we are far from achieving speech recognition in computing world, not so much for lack of efforts but lack of enough technological break-through we need to really realize the dream.  A dream that will one day change the world and make computing applications to be even more widely applicable to the rest of the mankind and in all kinds of applications that can only limited by our imaginations.

So what are some of the basics regarding voice and speech recognitions?

The entire subjects of voice & speech recognition are often confused and the terms are often misused.  Knowing what’s being spoken is very different than knowing who is speaking.  Voice recognition focuses on who is speaking it (a synonym for the speaker – a way we can recognize one speaker from another) and the speech recognition concentrates on what is being spoken.

Base on above then there are the difference between speaker verifications and identifications.

In 2020, John goes to an ATM machine, speaks to it to be verified as the owner of a checking account so that he can take the money out of the account and transfer into his “wallet” via EHF (Extremely High Frequency) signal.  A “wallet” that looks like a handheld device of today also acting as his multi-function voice-command communicator as well as his remote control for all things electronic including the garage door, as well as the key to start his zero-emission flying transportation vehicle just came out last year.  Speaker verification is typically used in access control applications.  In this case, there is a known voice print to be compared with.

Meantime, in Interpol’s data center in New York, Mary is entering a voice file just intercepted and decrypted from Internet into their super computer in hope to identify it against one of the millions criminals’ voices stored in the system.  Speaker identification is typically used for matching or checking the speaker against something that may or may not be known to the system that is matching or checking it.  Contrast to speaker verifications, speaker identification requires a complex voice processing algorithms to produce matching.

Today, we have good success with voice & speech recognitions in controlled environment.  However, the application becomes very limited for a technology that can only work in controlled environment.

At issues are the environmental noises, multiple speakers speaking at the same time and training the system to understand us.  At high level, we are really talking about pattern recognition in a very fine detailed way that works in any environment at a very high speed.

Today, we have not yet seen either voice or speech recognition used in any sort of wider way because we still do not have solved some of the issues stated above.  Even speaking to our voice command cell phone could be a trying experience at times.  I believe the first issue to solve is how we can have quality input data that is free of the environmental interference, I challenge to say today’s input device base on microphone technology is not going to work well other than in controlled environment.

I dream of a day where typing, reading and writing skills are optional to use a computer.  No that those are unnecessary skills, they are very important skills, just that they should not be the hurdles of using a computer for all mankind.  Keep in mind not everyone speaks or writes in English.  Try to type a short email from your handheld in Chinese today; it is doable but how long.  More importantly computer and handheld devices do not need to have a keyboard to be useful and can really be everywhere and speaks all languages (just patterns for computer).

By Benson Yeung, Senior Partner

Benson Yeung Biography

Since 1991, Mr. Yeung has consulted on IT and business related issues to over 300 small, medium, and large Bay Area organizations. He also contributes articles to the Loral Computer Special Interest Group, Microsoft Project, Silicon Valley Computer Society monthly newsletter and other nation-wide publications. During the past 20 years, he has spent a significant amount of time in IT security fields and has a deep understanding of the state of IT security issues and has developed frameworks and best practice methodologies for the field.

Mr. Yeung’s client list includes Flextronics, HP, Levis-Strauss, Loral, NeXT Computer, New York Life, Stanford University, Symantec and many other companies. Mr. Yeung also works closely with various VC firms and startups in the Bay Area as a Technology Advisor, IT & Operations Consultant. Mr. Yeung has a B.S. degree in Computer Science from Arkansas State University. He is also a Microsoft Certified Trainer (MCT) & System Engineer (MCSE).

Back to Top

Information Request Form

Sign up for TNS News Letter

Information Request Form

Select the items that apply, and then let us know how to contact you.

Request a Senior Partner contact me
Request a Web Meeting and / or Web Demo
Subject
Name
Title
Company
Address
E-mail
Phone

Business Partners

   
     

© Copyrights Triware Networld Systems, L.L.C. ® 1991-2010