Meet the Neuroscientist Translating Brain Activity Into Speech

The average human speaks at a rate of up to 150 words per minute, making spoken conversation one of the most effective ways to communicate. “We take for granted how effortless it is to convey so much information in such a short amount of time,” says Edward Chang, a neurosurgeon at the University of California, San Francisco. “That is, until you lose this ability from an injury.” 

Brain injuries such as stroke and neurological disorders like amyotrophic lateral sclerosis (ALS) can destroy vocal communication, socially isolating patients or requiring them to use prostheses. The best of these prostheses are essentially brain-controlled typewriters: A person moves a computer cursor with brain signals detected by a neural implant, painstakingly selecting one letter at a time. Eight words per minute is fast. (Perhaps the most famous speech prosthetic belonged to the late physicist Stephen Hawking, who, with muscle twitches, typed each word for a speech synthesizer to read.)

To emulate speech at a more natural speed, some researchers have tried going a step further, literally reading people’s minds by measuring neural activity in the brain’s speech center to drive an artificial voice synthesizer. But success has been limited to monosyllabic utterances. Turns out the brain is pretty complicated.

DSC-D0120 17

(Credit: Noah Berger/UCSF)

Chang wondered whether an indirect approach would be better. Observing that fluid speech depends on fine motor coordination of the vocal tract (including the lips, tongue, jaw and larynx), he reasoned that the neural activity commanding these muscle movements could control the articulations of a synthesizer. “Patterns of activity in the brain’s speaking centers are specifically geared to precisely coordinate the movements of the vocal tract,” he explains. “We figured out how neural activity there directly controls the precise movements when we speak.”

To test his idea, Chang enlisted five people undergoing treatment for epilepsy, whose therapy already included surgical insertion of electrodes under the scalp. He monitored their brain activity while they spoke hundreds of sentences aloud, and used the data to train artificial intelligence software. The AI learned to decode the brain signals into whole sentences, which continued to work when volunteers simply mimed speaking them. When the brain-AI-speech system was tested, the machines understood with 70 percent accuracy. 

In addition, as Chang reported in April in Nature, the patients’ desired intonation was preserved. “Intonation allows us to stress specific words, express emotion or even change a statement into a question,” Chang says. His group discovered that the crucial pitch changes are achieved by adjusting tension in the vocal folds of the larynx, and that the corresponding brain signals could be monitored precisely enough for the synthesizer to impart the emotional subtext of patients’ speech.

Chang cautions that his technology will not address all conditions — such as injuries to brain areas responsible for controlling the larynx and lips — and he’s only now starting clinical trials on people with stroke and ALS. These patients can’t train the AI with spoken sentences as the subjects of his study did, since their ability to speak aloud is already gone. However, Chang found that speech-related brain activity was very similar in all five of his study volunteers, so individual training may not be necessary. 

In the future, the gift of gab may be plug-and-play.

Comments are closed.