‘Hey CSIRO's Data61, how can I prevent spoofing attacks on my voice assistant?’
Good question – if you’re one of the 2.9 million smart speaker or voice assistant owners in Australia (that’s 14% of the population), ensuring your smart device isn’t used to impersonate you and trick third parties into releasing your personal information is crucial.
Siri, Alexa, Bixby and Google Home all allow people to use voice commands to quickly shop online, make phone calls, send messages, control smart home appliances, and access banking services. Because of these personal and security-critical commands, voice assistants are lucrative targets for attackers to exploit.
By identifying the differences between a live human voice and voice relayed through a speaker, a system developed by researchers at Data61 can detect when hackers are attempting to spoof.
Known as Void, the system compares the frequencies of human and loudspeaker relayed voice commands using signal processing techniques and parameter optimisation. This method can identify the ‘liveness’ of a voice, ultimately improving origin detection accuracy.
When tested using a benchmark dataset in detecting voice reply attacks, this novel approach saw 94 to 99 percent of all attacks successfully identified.
As the adoption of smart speakers in Australia grows, privacy-preserving technologies are becoming increasingly important in enhancing consumer privacy and security argues Muhammad Ejaz Ahmed, Cybersecurity Research Scientist at Data61 and lead author of the Void research paper.
"Voice spoofing attacks can be used to make purchases using a victim's credit card details, control Internet of Things connected devices like smart appliances and give hackers unsolicited access to personal consumer data such as financial information, home addresses and more," says Dr Ahmed.
"Although voice spoofing is known as one of the easiest attacks to perform as it simply involves a recording of the victim's voice, it is incredibly difficult to detect because the recorded voice has similar characteristics to the victim's live voice. Void is game-changing technology that allows for more efficient and accurate detection helping to prevent people's voice commands from being misused.”
Existing voice spoofing technology typically employs multiple complex deep learning models and thousands of features, resulting in delays and model complexity issues.
Void’s single classification model, spectrogram analysis and mere 97 features allow it to detect attacks eight times faster than deep learning methods while using 153 times less memory, making it a viable and lightweight solution for smart devices.
“Void requires only 2MB of memory space,” explains Dr Ahmed. “Because Void is so fast and lightweight, it reduces the computational burden of the server and can be implemented at device level.”
“With this type of on-device deployment, we even have the potential to use voice assistants without connecting to the internet.”