“I said not one word of that”. When AI puts words in our mouths

Last December, the Conservative peer Charlotte Owen introduced the Non-Consensual Sexually Explicit Images and Videos (Offences) Bill, which has made its way through the House of Lords. This followed the 2023 Online Safety Act, which not only made it a criminal offence to share, or threaten to share, images or videos of someone in an intimate state, but also included digitally manipulated ones, known as deepfakes, appearing to show someone in such a state.

I am entirely on her side, but I would also like to see it cover content of a non-sexual nature, audio as well as video, which can cause similar humiliation and distress to those targeted, male as well as female. While AI cannot literally put words in our mouths, it can do so virtually or digitally, cloning our voices as well as our faces.

Back in 2023, Stephen Fry asked his audience to compare his voice, from a clip of a documentary about the Dutch resistance he narrated in English, with an AI-generated version of it, only to tell them that was the AI: ‘I said not one word of that.’ His agents, unaware such technology existed, went ballistic, but he knew there was more to come: ‘You ain’t seen nothing yet.’

The Mayor of London, Sadiq Khan, however, would have seen quite enough, after hearing a deepfake audio sounding like him early last year saying inflammatory things about Remembrance weekend, calling for pro-Palestinian marches and declaring that the Metropolitan Police did what he told them to do. It certainly sounded like him, with the London accent, complete with glottal stop; this was a part of his identity having been stolen and subverted, which understandably angered him.

One company has a celebrity face detector, so if you try to make a video of one saying what they never said, it’ll be blocked, but why shouldn’t those less well known have similar protection? Take Olga Loiek, a young Ukrainian woman shocked to find her AI-generated doppelganger a) speaking Chinese, a language she did not speak and b) espousing pro-Russian views. Unfortunately, as the videos were hosted in China, she stood little chance of having them taken down.

Personally, my bugbear is AI translation; while you can now make the lip movements match the dubbed soundtrack in a new language, that is completely outweighed by a stilted, wooden translation, though humans aren’t always better. Someone once translated something I had written online, unsolicited, and by mistranslating just one word, destroyed the argument I was making; thankfully, it was in Portuguese, a language I understood well enough to be able to spot the mistake, namely that the word ‘subtitled’ was mistranslated as ‘dublado’ or ‘dubbed’, not as ‘legendado’.

And mistranslation can be dangerous, even without AI. In 2012, some American actors who took part in an amateurish short film of the Arabian Nights genre, were shocked to see their lines replaced by Islamophobic slurs crudely dubbed in, before then being dubbed in Arabic, making it impossible to distinguish what was originally said from what was not. The film, called ‘Innocence of Muslims’, was intended to provoke, and provoke it did, resulting in an angry backlash in the Arab world, resulting in the death of the US Ambassador to Libya.

Even if your face and voice are being used in support of a cause you may sympathise with, if it was made without your consent, the principle remains the same; while Scarlett Johansson, along with other Jewish celebrities, would have abhorred Kanye West’s latest antisemitic outbursts, she did not take kindly to an AI-generated clone of herself being used to denounce them. Maybe the creator was trying to highlight how she and other stars should be more vocal on the issue, with some Jewish commentators wishing the video were real, but it was digital vigilantism.

And even if it’s for satire or comedy, being able to clone the face and voice of the person you were lampooning misses the point, namely that it’s about creating caricatures, not clones. Take Nerine Skinner, with her send-up of Liz Truss (‘Liv Struss’), for example; you could replace her face and voice with those of that former Liberal Democrat member using AI, but what value would it add? While her impression is more affectionate than biting, more Mike Yarwood than Spitting Image, using AI video or audio is a solution in search of a problem.

* Ken Westmoreland is a member of the Taunton and Wellington Local Party.

William Wallace writes: The case for higher taxes

ALDC’s By-Election Report 09.07.26

William Wallace writes: What do Liberal Democrats have to offer the…

“Build, baby, build” but don’t lower our ambitions on…