Reality Defender
Stopping deepfakes before they become a problem
In April, password management firm LastPass announced it had foiled a phishing attack that involved the impersonation of its CEO, Karim Toubba. Just a few weeks later, scammers impersonated the CEO of WPP, the world’s largest group of ad agencies, getting themselves a virtual meeting with the targeted employee. Both attacks were orchestrated via text messages from spoofed WhatsApp accounts; more concerningly, both involved convincing deepfake audio messages generated by AI systems trained on YouTube footage of the targeted executives speaking at public events.
Both schemes fell apart before significant damage was done, but that hasn’t always been the way these things go. The proliferation of cheap and easy-to-use generative AI tools is powering a new wave of synthetic identity fraud that could result in hundreds of billions of dollars in losses annually.
At the leading edge of technology to address this problem is Reality Defender, whose Series A funding round DCVC led last fall. The company recently launched the latest upgrade to its industry-leading multimodal deepfake detection platform: the ability to scan audio in real time with an output that gives clients a fine-grained understanding of the likelihood any particular recording has been digitally generated or altered. It also includes information about precisely which elements of a recording it has flagged as synthetic, a necessity in an environment where the fake is often strategically mixed with the real.
We caught up with Reality Defender CEO Ben Colman fresh off the company having won “Most Innovative Startup” at the prestigious RSA Conference’s Innovation Sandbox contest. This interview is 100% real.
DCVC: According to surveys, a third of businesses have already experienced attempts at deepfake voice fraud. Scary enough as that is, are we really only seeing the tip of the iceberg here, with most scams still going undetected?
Colman: Unfortunately, yes. As generative AI tools that create these voices becomes more accessible, cheaper, faster, and more convincing, so too will the high- and low-profile instances of fraud that use this technology. If anyone, regardless of technical prowess or device access — even with a cheap Android phone — can create deepfakes that thwart, say, investments in multi-billion-dollar voice biometric systems, then the ease of access and lack of preventative regulation only means exponential growth is expected here. That is, of course, until more enterprises, institutions, and governments act on implementing deepfake detection measures.
DCVC: Your clients can now use your platform to detect audio deepfakes in real time in their networks and call centers. Why did you consider it such an urgent need to get this tech into your clients’ hands?
Colman: If I pick up the phone with my financial institution right now, I say “my voice is my password.” Once that’s done, I can do anything I want — transact here and there, send to any new account, etc. Voice biometrics are so engrained in everyday processes that to get rid of them would cause complete chaos. That said, I can easily make a deepfake of my voice for less than a penny and bypass these systems to transact at will. Considering this gaping hole in our financial system’s security, and considering how much of an impact not closing it would have on said system, we opted to deploy our on-demand audio detection models, pair them with models trained on in-house generated telephony data, and run them in real time environments. This not only plugs the security hole; it ensures that existing systems can continue working in the new and ever-changing environment of voice-driven deepfakes.
DCVC: Unlike other AI systems, Reality Defender’s platform doesn’t just give its output with no indication of how it arrived at its determinations, but offers users a peek inside the black box. What are the advantages of explainable results in AI?
Colman: It’s one thing to know if something is likely real or likely fake. It’s another thing to know why. With Explainability AI (or XAI), we’re able to visibly indicate what parts of a file our models considered the most in making their final decision. It’s not to point to something and say, “This is what gave it away.” It’s to say, “This portion of a frame, image, or what have you was used to influence the final decision.” We aim to offer as much data as possible when presenting clients and users with results that, in turn, can help influence important decisions.
DCVC: In just a few short years, we’ve seen generative AI go from text to still images to audio, and now with the launch of systems like Sora, video has arrived. Given the imminent arrival of rich and believable multimedia deepfakes, what’s next for Reality Defender’s capabilities?
Colman: We are expressly focused on the betterment — the upgrading, updating, iterating, and improving — of our detection models, in lockstep with if not a step ahead of generation models’ advancement. We want to cement our place as the leading solution for detecting AI-generated content. We want to answer new threat vectors quickly as they emerge, but also pre-empt some before they can get started.
DCVC: You wrote recently about the danger of becoming cynical about deepfakes and too readily paying out the “liar’s dividend.” What are some things that make you optimistic about society’s potential for maintaining a commitment to the true and the real?
Colman: There is no room for pessimism when working on building towards a better future, full stop. Years ago, the idea of computer viruses was on everyone’s mind, with obsessive and constant scanning and rescanning of all media locally so as to ensure your PC wasn’t infected with a “worm.” This was well-mitigated by the move to scanning files on the server-side, as well as a bevy of protections and standards — some driven by ingenuity, some driven by legislation — that have now moved this fear largely to the background and for large entities to handle, offloading the fears of average users. I truly believe the detection, mitigation, and other actions taken against potentially dangerous AI-generated content will follow suit — both through litigation and by companies seeing the immense risks hopefully before they become tangible. This is, after all, why we are not a consumer platform, as consumers should never have unequal access to the truth (that is, lack of detection vs. detection). This should be a standard service provided in the background where content is consumed or where business takes place.