Just how are deepfakes made, anyway?

Long story short: by making AIs fight each other.

December 29, 2020

Deepfakes — feared internet deception du jour — have inspired both amazement and alarm.

What is a deepfake? Essentially, it’s really, really good fake data.

Capable of rendering lifelike facsimiles, deepfakes can be used to make videos of essentially anyone saying anything — up to and including, oh, Barack Obama. The implications are awesome, and potentially frightening.

Deepfakes could be used to create fake stories about politicians, celebrities, scientists — imagine a realistic-enough Fauci imploring us not to take a COVID-19 vaccine.

What is a deepfake? Essentially, it’s really, really good fake data.

But deepfakes don’t have to be about deception. “Welcome to Chechnya,” a film documenting the lives of gay and lesbian Chechens, used deepfakes to protect the anonymity of its sources without scrubbing them of their humanity.

Or the South Park creators can give their fictional TV reporter Donald Trump’s face. Just endless possibilities, really.

So, if deepfakes are here to stay, we should learn to understand them — and that starts with the obvious: how are deepfakes made?

The Forger and the Inspector

Deepfakes are made using a type of deep learning AI, called a “generative adversarial network” or GAN. The name says it all: these networks generate an output by pitting two AIs against each other.

Sharon Zhou, a deepfakes instructor at Stanford and Coursera, explains it like this: picture two programs, one for an art forger and the other for an art inspector.

Naturally, the forger is attempting to forge a piece of art, and the inspector is trying to catch the telltale signs. The inspector is shown the real piece of art, as well as the fake, but it doesn’t know which is which.

The two AIs then pass the fake back and forth, as the forger tries to tweak the counterfeit until the inspector can’t tell the difference between the real and the fake.

We do need to give the GAN some rules to guide whatever output we want. If I wanted to deepfake a tarantula (for some unfathomable reason), I would give the inspector a list of guardrails: it needs a certain amount of eyes, two fanged chelicerae, eight legs, fur, etc. The more of these parameters I can give the GAN upfront, the better this horrifying deepfake will become.

Initial rules in hand, the forger sends round after round of hairy hell-spiders to the inspector, whose feedback then informs the forger.

“The art forger realizes ‘oh, you think this one looks realistic?'” Zhou told me. “‘I’m gonna keep drawing like this until it looks like Mona Lisa.'”

(Or, in the tarantula case, my worst nightmare.)

By being locked in battle, the GAN sharpens every output, increasingly deducing what the real McCoy looks like, until the inspector declares that it is real — and what it turns out is usually pretty damn real looking to us, too.

Deepfakes’ True Potential

GANs have a major advantage over CGI or just Photoshopping something.

Pitting two networks against each other allows GANs to rapidly create the most realistic simulated data possible — which is essentially what we’re talking about when we’re talking about deepfakes.

Once you see it from that frame, the true power of deepfakes comes sharply into relief: GANs can be harnessed to create incredibly realistic artificial data for anything, from Hillary Clinton speeches to particle physics experiments.

That’s a bit of foreshadowing for you.