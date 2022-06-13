Actor Val Kilmer lost his voice to throat cancer, yet in the new “Top Gun” movie, he does speak a line, thanks to an artificial intelligence program that recreated his voice.
That is a good use of audio “deepfakes,” computer-generated voices that sound human. Here’s a bad use of the evolving tech:
Bank robbers faked the voice of a company’s director in order to steal $35 million in a 2020 fraud case in the United Arab Emirates. An employee believed they were speaking with the executive on the phone, directing them to transfer funds. But the employee was speaking with a deepfake imitating the director.
Artificial voices are a booming industry. The movie studio DreamWorks worked with the AI company Cameo to allow users to make the animated character Boss Baby say what they typed into a website. Boss Baby, ever the entrepreneur, charged 20 bucks a pop.
The language school Berlitz uses synthetic voices to create teaching programs that would be expensive and time-consuming to record from humans reading scripts. An AI company that works with Berlitz on the programs, Hour One – with the ironic slogan, “Humanize your content” – urges customers to create voices – or an entire artificial humanoid character – on its website.
But what about the capacity for fraud? The UAE case was not the only example of executive voices being faked to mislead employees. The CEO of a company in the United Kingdom fell for the same sting in 2019, The Wall Street Journal reported, costing his company a quarter-million dollars.
Deepfake videos have been used to mislead voters and consumers on YouTube for years. Last month scammers created a video of Tesla founder and Twitter troll Elon Musk supposedly endorsing a cryptocurrency. Doctored videos (known as “cheapfakes”) of House Speaker Nancy Pelosi stuttering and slurring her words have been posted several times.
But deepfake audio fraud is different, cybersecurity experts say. This is not a YouTube stunt designed to go viral. It is a targeted and crafty fraud representing an evolution of phishing, the scam in which malicious links are dropped into office emails and used to defraud companies. The FBI reported last month that compromised business emails stole $43 billion from companies over the past five years, making it the costliest cyber crime.
One favorite trick of criminals is “spoofing” emails, making them look like they are coming from a top executive who is directing the transfer of funds. Employees have gotten better at spotting fake emails, by looking at the sender’s email address, or strange language. But employees don’t think to listen for faked instructions from executives. We trust our ears too much, experts say.
Pindrop, an Atlanta company, wants to change that. At the RSA cybersecurity conference last week in San Francisco, the company demonstrated how its AI can hear flaws in the criminals’ AI.
The company is one of several providing an artificial gatekeeper to companies. Pindrop AI analyzes incoming calls, assessing risk by evaluating 1,300 tiny aspects of calls.
Pindrop CEO Vijay Vijay Balasubramaniyan praises uses such as giving Val Kilmer his voice back. There are many good uses of computer-generated voices, he says. But Imitating people in business is “the scariest thing,” he says.
“They're actually able to capture the accents of the CFO or the CEO. So the CFO calls you on a Zoom call and says, ‘Hey, I've just had a bad day, so I'm not getting on video.’” Then criminals use the artificial voice to instruct a transfer of funds. Most people aren’t looking for fraud in that scenario. So how does technology listen for it?
Deepfake audio doesn’t do emphasis well, as you might have heard in the somewhat monotonal recorded speech of customer service bots. Human voices have evolved over time to nuances of emphasis, dialects, and other quirks that deepfakes can’t yet match. And while humans may not pick up on the clues in a fake voice, AI can, Pindrop says.
“We did a study where we actually asked a set of humans to differentiate between a deepfake and a human voice,” Balasubramaniyan says. “The accuracy of humans was 57%. So humans can't do it. This is where you need technology.”
So AI is used to imitate human voices, and other AI is used to listen for flaws. Humans are on the sidelines. It’s an AI conversation with millions and perhaps billions at risk.
Patrick Murphy is a San Francisco investor at the venture capital firm Tapestry, which backs Hour One, the startup helping Berlitz create language courses with synthetic voices. Murphy believes that, in the end, Boss Baby will beat fraud bosses. Legal uses will far outnumber scams, he says.
“With the right guardrails, I think these will be a lot more useful for society than harmful,” he told me in a phone call.
At least, I think that was a human on the line.