Skip to content

A data scientist cloned his best friends’ group chat using AI


As data scientist Izzy Miller puts it, the group chat is “a hallowed thing” in today’s society. Whether located on iMessage, WhatsApp, or Discord, it’s the place where you and your best friends hang out, shoot the shit, and share updates about life, both trivial and momentous. In a world where we are increasingly bowling alone, we can, at least, complain to the group chat about how much bowling these days sucks ass.

“My group chat is a lifeline and a comfort and a point of connection,” Miller tells The Verge. “And I just thought it would be hilarious and sort of sinister to replace it.”

Using the same technology that powers chatbots like Microsoft’s Bing and OpenAI’s ChatGPT, Miller created a clone of his best friends’ group chat — a conversation that’s been unfurling every day over the past seven years, ever since he and five friends first came together in college. It was surprisingly easy to do, he says: a project that took a few weekends of work and a hundred dollars to pull together. But the end results are uncanny.

“I was really surprised at the degree to which the model inherently learned things about who we were, not just the way we speak,” says Miller. “It knows things about who we’re dating, where we went to school, the name of our house we lived in, et cetera.” 

And, in a world where chatbots are becoming increasingly ubiquitous and ever more convincing, the experience of the AI group chat may be one we’ll all soon share.

The robo boys arguing about who drank whose beer. No conclusions were reached.
Image: Izzy Miller

A group chat built using a leaked AI powerhouse

The project was made possible by recent advances in AI but is still not something anyone could accomplish. Miller is a data scientist who’s been playing with this sort of tech for a while — “I have some head on my shoulders,” he says — and right now works at a startup named that happens to provide tooling that helps exactly this sort of project. Miller described all the technical steps needed to replicate the work in a blog post, where he introduced the AI group chat and christened it the “robo boys.”

The creation of robo boys follows a familiar path, though. It starts with a large language model, or LLM — a system trained on huge amounts of text scraped from the web and other sources that has wide-ranging but raw language skills. The model was then “fine-tuned,” which means feeding it a more focused dataset in order to replicate a specific task, like answering medical questions or writing short stories in the voice of a specific author.

Miller used 500,000 messages scraped from his group chat to train a leaked AI model

In this case, Miller fine-tuned the AI system on 500,000 messages downloaded from his group iMessage. He sorted messages by author and prompted the model to replicate the personality of each member: Harvey, Henry, Wyatt, Kiebs, Luke, and Miller himself.

Interestingly, the language model Miller used to create the fake chat was made by Facebook owner Meta. This system, LLaMA, is about as powerful as OpenAI’s GPT-3 model and was the subject of controversy this year when it was leaked online a week after it was announced. Some experts warned the leak would allow malicious actors to abuse the software for spam and other purposes, but none guessed it would be used for this purpose.

As Miller says, he’s sure Meta would have given him access to LLaMA if he’d requested it through official channels, but using the leak was easier. “I saw [a script to download LLaMA] and thought, ‘You know, I reckon this is going to get taken down from GitHub,’ and so I copied and pasted it and saved it in a text file on my desktop,” he says. “And then, lo and behold, five days later when I thought, ‘Wow, I have this great idea,’ the model had been DMCA-requested off of GitHub — but I still had it saved.”

The project demonstrates just how easy it’s become to build this sort of AI system, he says. “The tools to do this stuff are in such a different place than they were two, three years ago.”

In the past, creating a convincing clone of a group chat with six distinct personalities might be the sort of thing that would take a team at a university months to accomplish. Now, with a little expertise and a tiny budget, an individual can build one for fun.

Miller was able to sort his training data by author and prompt the system to reproduce six distinct (more or less) personalities.
Image: Izzy Miller

Say hello to the robo boys

Once the model was trained on the group chat’s messages, Miller connected it to a clone of Apple’s iMessage user interface and gave his friends access. The six men and their AI clones were then able to chat together, with the AIs identified by the lack of a last name.

Miller was impressed by the system’s ability to copy his and his friends’ mannerisms. He says some of the conversations felt so real — like an argument about who drank Henry’s beer — that he had to search the group chat’s history to check that the model wasn’t simply reproducing text from its training data. (This is known in the AI world as “overfitting” and is the mechanism that can cause chatbots to plagiarize their sources.)

“There’s something so delightful about capturing the voice of your friends perfectly,” wrote Miller in his blog post. “It’s not quite nostalgia since the conversations never happened, but it’s a similar sense of glee … This has genuinely provided more hours of deep enjoyment for me and my friends than I could have imagined.”

“It’s not quite nostalgia since the conversations never happened, but it’s a similar sense of glee.”

The system still has issues, though. Miller notes that the distinction between the six different personalities in the group chat can blur and that a major limitation is that the AI model has no sense of chronology — it can’t reliably distinguish between events in the past and the present (a problem that affects all chatbots to some degree). Past girlfriends might be referred to as if they were current partners, for example; ditto former jobs and houses.

Miller says the system’s sense of what is factual is not based on a holistic understanding of the chat — on parsing news and updates — but on the volume of messages. In other words, the more something is talked about, the more likely it will be referred to by the bots. One unexpected outcome of this is that the AI clones tend to act as if they were still in college, as that’s when the group chat was most active.

“The model thinks it’s 2017, and if I ask it how old we are, it says we’re 21 and 22,” says Miller. “It will go on tangents and say, ‘Where are you?’, ‘Oh, I’m in the cafeteria, come over.’ That doesn’t mean it doesn’t know who I’m currently dating or where I live, but left to its own devices, it thinks we are our college-era selves.” He pauses for a second and laughs: “Which really contributes to the humor of it all. It’s a window into the past.”

A chatbot in every app

The project illustrates the increasing power of AI chatbots and, in particular, their ability to reproduce the mannerisms and knowledge of specific individuals.

Although this technology is still in its infancy, we’re already seeing the power these systems can wield. When Microsoft’s Bing chatbot launched in February, it delighted and scared users in equal measure with its “unhinged” personality. Experienced journalists wrote up conversations with the bot as if they’d made first contact. That same month, users of chatbot app Replika reacted in dismay after the app’s creators removed its ability to engage in erotic roleplay. Moderators of a user forum for the app posted links to suicide helplines in order to console them.

Clearly, AI chatbots have the power to influence us as real humans can and will likely play an increasingly prominent role in our lives, whether as entertainment, education or something else entirely.

The bots try their hand at a roast.
Image: Izzy Miller

When Miller’s project was shared on Hacker News, commenters on the site speculated about how such systems could be put to more ominous ends. One suggested that tech giants that possess huge amounts of personal data, like Google, could use them to build digital copies of users. These could then be interviewed in their stead, perhaps by would-be employers or even the police. Others suggested that the spread of AI bots could exacerbate social isolation: offering more reliable and less challenging forms of companionship in a world where friendships often happen online anyway.

Miller says this speculation is certainly interesting, but his experience with the group chat was more hopeful. As he explained, the project only worked because it was an imitation of the real thing. It was the original group chat that made the whole thing fun.

“What I noticed when we were goofing off with the AI bots was that when something really funny would happen, we would take a screenshot of it and send that to the real group chat,” he says. “Even though the funniest moments were the most realistic, there was this sense that ‘oh my god, this is so funny I can’t wait to share it with real people.’ A lot of the joy came from having the fake conversation with the bot, then grounding that in reality.”

In other words, the AI clones could replicate real humans, he says, but not replace them.

In fact, he adds, he and his friends — Harvey, Henry, Wyatt, Kiebs, and Luke — are currently planning to meet up in Arizona next month. The friends currently live scattered across the US, and it’s the first time they’ll have gotten together in a while. The plan, he says, is to put the fake group chat up on a big screen, so the friends can watch their AI replicas tease and heckle one another while they do exactly the same.

“I can’t wait to all sit around and drink some beers and play with this together.”


Source link