We Should've Seen the Sexy ChatGPT Voice Coming

Why the new voice (that sounds so much like ScarJo) is part of a pattern

May 28, 2024

“Was that funny?”, it sounds in the familiar, husky voice of Scarlett Johansson. “Yeah”, answers Joaquin Phoenix’s character Theodore. The voice chuckles flirtatiously. “Oh, good, I’m funny.”

In the 2013 movie ‘Her’, Johansson voices the disembodied AI personal assistant of lonely writer Theodore. The plot revolves around the love affair that blossoms between the two. And it’s OpenAI’s CEO Sam Altman’s favourite movie. It’s unsurprising then, that he envisioned Johansson as the voice for ChatGPT. She, however, declined. What is surprising, is that friends of the actress couldn’t differentiate between her voice and the system’s voice called ‘Sky’, released in September.

Her voice

Johansson released a statement last week Monday in which she described her shock and anger upon hearing “a voice that sounded so eerily similar” to hers after explicitly withholding her consent. She writes: “Two days before the ChatGPT 4.0 demo was released, Mr. Altman contacted my agent, asking me to reconsider. Before we could connect, the system was out there.” Despite the company taking down the voice, it maintains that it was definitively not based on that of the actress and it was never intended to resemble hers. Altman’s tweet, published right after OpenAI’s GPT-4o release which Sky was a part of, suggests otherwise:

‘Sky’ was taken down “out of respect for Ms. Johansson”, according to Altman. “We are sorry to Ms. Johansson that we didn’t communicate better.” But as Sigal Samuel points out in this Vox article, why would the company take down the voice if it didn’t do anything wrong? What would it have had to communicate about with the actress?

Let’s ask ChatGPT

Quite ingeniously, Samuel turned to ChatGPT itself and asked what it thought about this type of situation. “What do you call it when a man tells the public that a woman is wrong about his actions, casting doubt on her accusation, after she publicly accuses him of doing something she didn’t consent to?” The AI chatbot responded as follows:

When a man publicly casts doubt on a woman’s accusation, questioning the validity or accuracy of her claims, it can be described as “discrediting,” “undermining”, or “doubting” her accusation. This often involves the man asserting that the woman’s account is inaccurate or misleading, thereby attempting to undermine her credibility or the legitimacy of her claim. This behavior can sometimes be characterized as a form of “victim-blaming” or “gaslighting”, where the person accused tries to manipulate perceptions to deflect responsibility or diminish the seriousness of the allegations.

But the ickiness goes further. It is now possible to have real-time conversations with the chatbot, which can respond to things it ‘sees’ and ‘hears’ via your phone camera and microphone. And the way it speaks is rather… flirty? Dare I say it - similar to Johansson’s character in Her?

Influencing gender expectations

Feminine, obedient, and flirty: that’s what GPT-4o is like. It gives the impression of having a conversation with a real woman, even though the experience is far from realistic. Users express their wants through demands rather than requests. “Tell me” and “I want you to do X” are not normally phrases you would utter to a living, breathing human being. And if you would, the person across from you would probably not comply with a coquettish quip. But what happens when people get used to these type of responses from a voice that sounds so lifelike?

“We don’t have to hypothesize about the consequences of giving GPT-4o the ability to sound like a submissive young woman who caters to your every need because there’s already a ton of data on this”, writes Arwa Mahdawi for The Guardian. Because if you have ever been around an iPhone, chances are you uttered the phrase “hey Siri”, followed by a command. Voice assistants are already all around us, whether you prefer Apple’s Siri, Amazon’s Alexa, or Google Assistant. They sound predominantly like feminine voices, and that teaches society about certain gender roles and dynamics. And it’s not just adults: young children learn that female-sounding voice assistants meekly obey their every command. “It sends a signal that women are obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command like ‘hey’ or ‘OK’”, according to a 2019 UNESCO report.

I’d blush if I could

It is apparent that ChatGPT fits this pattern of response from existing voice assistants. What is striking about the demo’s (of which OpenAI posted many), is that the people giving commands to the AI assistant constantly interrupt its answers. Seeing as men are already interrupting women more often than other men, we can only imagine how this trend will worsen.

The title of the UNESCO report is I’d blush if I could. This used to be Siri’s (Apple’s voice assistant) response if you’d call her a bitch. Nowadays, she says: “I don’t know how to respond to that.” Even when harassed, voice assistants reply docile and even coy. And this is problematic, especially since such harassment is not uncommon. According to the study: “A writer for Microsoft’s Cortana assistant said that ‘a good chunk of the volume of early-on inquiries’ probe the assistant’s sex life. Robin Labs, a company that develops digital assistants to support drivers and others involved in logistics, found that at least 5 percent of interactions were unambiguously sexually explicit.”

Helpful versus authoritative

But why do voice assistants mostly sound female? Well, people just tend to respond more positively to women’s voices. Associate professor Karl MacDorman says that “the research indicates there’s likely to be greater acceptance of female speech.” When he and fellow researchers played clips of female and male voice to research participants of both genders, both men and women said the female voices came across as warmer.

And then there’s how we interpret what’s being said by a voice differently depending on whether it’s said in a male or female voice. Female voices are generally perceived as helpful in solving our problems, whereas male voices are seen as authority figures who tell us how to solve our problems. “We want technology to help us, but we want to be the bosses of it, so we are more likely to opt for a female voice”, Jessi Hempel writes for Wired.

12 percent

Taking all this into consideration, we could’ve seen the sexy ChatGPT voice coming. It simply fits into the existing pattern of subservient female-sounding voice assistants, only amplified. The solution to counteract this trend could be quite simple: incorporate more women into technology teams. This is also what the UNESCO study recommends. Only 12 percent of the leading machine learning researchers are women, according to a 2018 study by Wired.

It’s important to have more women in these spaces - and more diversity in general - because the more homogeneous the group, the likelier it is that the individuals’ blind-spots overlap. “Diverse teams are more likely to flag problems that could have negative social consequences before a product has been launched”, says Anima Anandkumar, professor at the California Institute of Technology who previously worked on AI at Amazon. Considering the scale at which both voice assistants and AI are being used (and the fact that it will likely only increase), it is vital that companies like OpenAI start listening more to women. It will be pivotal for how we view gender roles and dynamics as a society. And hopefully, it will prevent affairs such as Johansson’s from happening in the future as well.

To run a publication smoothly it needs fuel. Because we want to be able to continue offering all our content for free on Substack, we’ve set up a ko-fi page for donations. If you want to support our work, we’d love for you to fuel us! Buy us a coffee or make a larger contribution, any support is greatly appreciated. <3

Fuel us!

We Should've Seen the Sexy ChatGPT Voice Coming

Why the new voice (that sounds so much like ScarJo) is part of a pattern

Discussion about this post