No, you didn't get an RCE on ChatGPT
Or how ChatGPT fooled and continues to fool infosec Twitter
Right now, all of tech Twitter is losing over OpenAI’s ChatGPT model. This new GPT model has been designed to be more conversational than GPT-3, simultaneously adding new documents and parameters. Notably, this includes ChatGPT demonstrating an ‘understanding’ of and ability to ‘write’ code. In reality, the model isn’t able to really understand and is using a kind of wisdom of the crowd from the training data mixed with human feedback. My colleague Matthew Shardlow wrote about this danger of anthropomorphising NLP models and similar arguments have been made previously in well-known papers such as Stochastic Parrots. Now, because of the way ChatGPT has been developed (as a kind of assistant and conversational), it tries to answer user queries in the way they would like it too or is “eager to please”. While everyone was hyped about this new development paradigm/technology, StackOverflow banned it, citing it’s ability to be confidently incorrect.
The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce
This goes beyond just programming and the ML community has named it “hallucinations”, where it generates false information and assures the user it is true. Most recently, this has been seen on Bing, where the assistant will insist the current year is 2022 and Avatar Way of Water isn’t out yet, calling users who disagree foolish. So why do models hallucinate? Well, it all comes down to probability. It will look at the most common words, mash them together, I really like this tweet thread below to see a real example of this.


So why do people keep thinking they have an RCE in ChatGPT. Well, ChatGPT can pretend to be a console output and hallucinate an entire filesystem and Unix operating system. And this makes sense; console output is extremely structured and predictable- there aren’t examples of an apt-get command going awry and returning Homer’s Odyssey or an AITA Reddit post, so it can consistently hallucinate.
I’m not going to post them here because I don’t want to publicly shame/embarrass anyone but a lot of hackers see these hallucinations and think they have genuinely gotten a Remote code execution/Arbitrary code execution/Root access to the server running ChatGPT. Bare in mind all that is happening is it’s returning the most likely next word each time it outputs.
So how can we test that this isn’t working? Well we can try to get it to interact with an external server. I’m going to be using the free and open source interactsh tool, this tool creates a random domain name e.g. <random id>.oast.* and monitors any interaction with it e.g. DNS or HTTP. If we had a real remote code execution type vulnerability we could use curl to access this website and then interactsh would let us know if our website has been touched by ChatGPT. When I curl the page, I get a 404 not found, despite the fact when I curl on my actual terminal, I get a random ID in the body. I also do not get any interactions apart from my own IP address. So it’s not actually interacting with it, it’s just hallucinating again.
On Twitter it was suggested that you might be able to get it to ping a fake IP address, I had a go at this but it didn’t work so your mileage may vary, but it’s definitely not RCE.