A Device to Tune In or Out

How humans speak, by taking turns, forms the basis for automated AI-driven proactive hearing assistance.

Noise-canceling headphones are great but they’re a binary all-or-nothing option. The hard of hearing don’t need to eliminate all noise, instead they simply want to focus on the sound or conversation of their choice.

Researchers at the University of Washington have developed a set of headphones that uses AI to automatically decipher whom the user is talking to and helps them hear the conversation better.

The nuance of conversation

Individuals with hearing loss don’t have much use for basic noise-canceling devices. Instead, they need to be able to “program an acoustic scene,” said Shyam Gollakota, the Thomas J. Cable Endowed Professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington, Seattle. Users want to choose to whom to listen, while ignoring crying babies or other noises and conversations.

Gollakota and his team developed an intuitive way in which AI can learn how to favor particular conversations and to tune out others: by paying attention to the cadence of human conversations and the turn-taking that’s their salient feature. Such an approach infers user intent and adjusts and favors noise accordingly. “The question we asked is ‘can the AI in a headset proactively figure out who the users are and who they are in a conversation with?” Gollakota said. The prototype for the approach uses regular off-the-shelf headphones.

You Might Also Enjoy: Sound on Mars?

Such proactive hearing assistance is a step up from earlier approaches where a human would manually direct, through use of buttons or other interfaces on a device, with whom they would like to speak (or tune out). The onus of choice being on the user is particularly challenging because the hard-of-hearing don’t want to call attention to themselves, so any automatic functionality is a win in that regard.

Understanding conversational dynamics

In a typical human conversation, people take turns. While there may be some back channel talk and overlap, it’s fairly easy, simply by observing the turn-taking behavior, to infer which individual is speaking and which is listening. And such an approach can apply even when people join or leave a conversation.

“People who are not part of a conversation are not going to take turns,” Gollakota pointed out. As a result, in a noisy environment, it’s easier to pinpoint which noises the AI should filter out (as in chatter at a neighboring restaurant table), and which ones it should keep.

“If I’m wearing the device the AI knows what I sound like so it can then track the conversation dynamics of everyone else in the vicinity automatically and figure out who is following conversational turns,” Gollakota said.

AI processing in stages

Understanding that small headphones might not always have the processing power for AI algorithms, the proactive hearing approach works on a dual-model architecture. The on-device algorithm, running every 10 milliseconds, extracts the conversation and relays it to the larger model on a connected smartphone.

Discover the Benefits of ASME Membership

This larger model in turn, which runs every second or two, figures out who the conversation participants are and sends the information back to the smaller model. Using that information, the smaller model filters out all extraneous noise and streams only the relevant conversation back into the ear.

Accommodating for real-world challenges

Since not everyone waits for one person to completely finish talking before chiming in, the proactive hearing approach accounts for an overlap of 20-30 percent. In addition to the user, the system can tackle one to four additional conversation participants.

Gollakota understands that large headphones might not be easily adopted, which is why the system has been tested to work on smaller hearing aids as well, for periods of six to eight hours.

In addition, the approach takes cultural differences into account. Japanese speakers, for example, don’t have much overlap in conversations and the cadence for intonation might be different for different languages. The proactive hearing research has thus far been tested and found to work for English, Mandarin, and Japanese.

Future research will likely focus on moving beyond human conversations. “We’d like to see how we can expand this work to include all kinds of sounds, including bird and animal sounds. How do you know what humans want to hear apart from just conversations?” Gollakota said.

Poornima Apte is a technology writer based in Walpole, Mass.