UW-Madison researchers have discovered a way to mimic people's voices using household items that, under the right dimensions, can effectively elude voice security technology.

Professor of electrical and computer engineering Kassem Fawaz and doctoral student Shimaa Ahmed have developed a mathematical model that could allow almost anyone to imitate the resonance of another person's voice with a simple tube. As long as a person could nail down the frequency of a voice with a PVC pipe that, based on the algorithm, had the correct width and length, they could trick the security technology 60% of the time.

Researchers' attempts to dupe the security technology were 10 times more successful with that PVC pipe than if people tried to impersonate another with their voice alone.

The research suggests security systems within voice recognition software aren't as secure as people are made to believe.

About 60% of Americans use voice recognition technology on a daily basis, often in innocent ways. It's commonly used through voice assistants on devices that activate when hearing "hey" followed by "Siri" or "Google," or simply "Alexa," or imbedded in TV remotes or a vehicle's infotainment system.

But voice recognition also holds the keys to some of people's most sensitive information. Large financial institutions such as Chase, BMO Harris and First Citizens Bank all offer voice identification software for their members, claiming it offers as much security as a fingerprint scan.

Voice recognition systems have anti-theft systems that can detect if a voice is coming from a person's throat in an "analog" fashion or if it's coming from a digital speaker, which is vital to security as artificial intelligence makes it easier to clone a person's voice. But that kind of technology can't detect the presence of a PVC pipe or any other cylindrical item that fits the algorithm, Fawaz said.

"The defenses that rely on distinguishing a human versus a machine are fundamentally flawed, because they can put anything between you and the microphone, and they can break these technologies," Fawaz said. "Other people are doing this kind of research and are trying to trick the system for nefarious purposes."

Research project

Ahmed and Fawaz's research focused on finding the holes in voice identification software alongside peers from the University of Toronto's Vector Institute and London's Alan Turing Institute, both of which study artificial intelligence. They'd already found that voice assistants reacted differently when taking steps to change their own voices, such as cupping their hands around their mouths.

Ahmed first started to experiment with the tube concept in March 2021, when her lab had not yet reopened to in-person research and all she had to work with was an empty paper towel tube in her kitchen.

As research progressed and Ahmed was back in the lab, noise interference and tubes that were not perfectly round proved to be a challenge. Ahmed and Fawaz turned to PVC pipes to ensure they had a material that would keep its shape while still allowing for sound vibration. When they needed sizes outside of standard dimensions commonly found at hardware stories, the pair opted to 3D-print them in the UW Makerspace lab.

Once they cracked the algorithm, 14 people lent their voices to the research, reading off a list of 50 phrases, each a few seconds long, both out loud and into the tubes. Depending on the speaker and the starting pitch of their voices, success rates ranged from 40% to 70%, Ahmed said.

When those same 14 participants were invited back and asked to mimic a celebrity after watching them talk for 30 minutes, only two managed to trick the security software at all.

"Our goal with this research is just to tell the public, don't rely on one method of security," Ahmed said. "Use voice ID if you want to use it, but also you should have two-factor authentication. So passwords are very, very important."

