Et in Arcadia Ego: A not-so-simple answer

From the annotations to Irregular Webcomic:

Here's my not-so-simple approach. A breakthrough in speech-recognition will occur when computers and A.I. systems learn to recognize speech the same same way that humans learn it: within the larger context of language learning, and with plenty of help from social conditioning.

Let's look at an off the cuff theory of language learning and use among human infants. Starting at about two months, infants begin transitioning from crying to cooing. At its basic level, cooing is the infant making all of the possible vowel sounds that lie within the capacity of the human vocal apparatus. Within eight to ten months, cooing will be supplemented with babbling, the same process done with consonant sounds. During this period, the infant is exploring all the sounds he or she can make.
Next begins a process of phonetic elimination. The infant begins to remove sounds that are not productive from their repetoire, molding the range of phoenetic production to the sounds the infant hears in the environment. The greatest influence on this process is the language use of adult humans. Infants learn most of their language behavior at this stage by mimicing the patterns of the sounds produced by adults. If the adults in the environment are speaking one language with a consistent set of phoenemes, the infant will reduce their vocolaizations to that set of phonemes.

Later stages of babbling begin to mold into proto-language use. This is the time when social condition becomes important. Over the course of babbling, the infant will tend to produce phonems from the ambient language set in more or less random order, to a certain degree mimicing phoneme patterns of adults. Most of this babbling will go rewarded at a constant rate, enough to encourage the continuance of babbling. However, in an application of Shakespear's infinite monkees, eventually the infant will stumble upon a phoeneme pattern that more or less matches an intelligible word of the ambient language at a time and place where it will be overheard and understood by an attending adult: a child's 'first word'. Frequently, this first word will result in the child being rewarded in some fashion.
Over the next months and years, the child will continue to be rewarded when making sounds appropriate to the ambient language and the social context, and will go unrewarded when making inappropriate sounds; making the sound 'milk' may be rewarded with the desired food item, while making the sound 'ilkm' goes unrewarded. In this way, the child's basic vocabulary is built until more complex language mechanisms begin to become employed.
...
Hearing language is probably helped by a similar social conditioning process. When one hears utterances, one responds with behavior (whether speech or action). When the responsive behavior is appropriate, one is rewarded. When the responsive behavior is inappropriate, one is unrewarded.
Children, too, undergo this conditioning process. When they correctly understand speech, they learn to respond appropriately and are rewarded. When speech is not correctly understood, they are unable to initiate the correct responsive behavior and go unrewarded. I feel this conditioning is integral to the language-hearing learning process.
...
Computers at this stage of technology are immune to social conditioning. Outside of various academic AI labs, computers are pretty much incapable of modifying their own behavior. Certainly, it is difficult to program a computer to have needs, or to recognise its needs. The presence and internal recognition of needs, and the desire and ability to have those needs met constitute the basis for the reward process of social conditioning. Once it becomes capable to reward computers, and only then, do I feel great strides will be made in the quest for decent speech recognition.
...
At the same time, one of the difficulties of computer speech recognition is the contextual dependency of language and the human language process. Quite a bit of error-correcting occurs in human speech by reference to the context of the utterance. If I'm talking to my doctor, and I make some sort of reference to 'elbeny', I may be speaking of my 'elboy' or my 'knee'. If I'm discussing New York geography, the same set of sounds is likely to be interpreted as 'Albany'.

Computers, at this point, are mostly incapable of making such judgements. They are less able to refer to context, and are unaware of all of the different contexts in which a native human language user is capable of drawing upon. This is another barrier to computer speech recognition.

-------
Googlebombing for a cause: www.minnesotangos.org

3 comments:

Lord Carnifex said...: Although the human speech recognition apparatus doesn't always work that well, either.; November 27, 2009 at 5:32 PM
Anonymous said...: Especially if you are deaf in one ear and can't hear out of the other.
There is a very nice video, unfortunately posted on Facebook, of your niece saying the word 'turtle'. Too bad I can't share it with you.
Have you read about recent research that shows that newborns cry in the rhythm of the language they heard in utero?; November 30, 2009 at 8:51 AM
phaedrus said...: Which reminds me that its time to restart on trying to learn a bit of conversational German to help the next biological "computer" in the household have a bit more to work with.; December 1, 2009 at 2:17 PM

Et in Arcadia Ego

Friday, November 27, 2009

A not-so-simple answer

3 comments:

Pages

Blog Archive

Copyright notice