HackAPrompt 2.0 returns with $500,000 in prizes for locating AI jailbreaks, together with $50,000 bounties for probably the most harmful exploits.
Pliny the Prompter, the web’s most notorious AI jailbreaker, has created a customized “Pliny observe” that includes adversarial immediate challenges that give an opportunity to hitch his crew.
The competitors open-sources all outcomes, turning AI jailbreaking right into a public analysis effort on mannequin vulnerabilities.
Pliny the Prompter would not match the Hollywood hacker stereotype.
The web’s most infamous AI jailbreaker operates in plain sight, instructing hundreds the best way to bypass ChatGPT’s guardrails and convincing Claude to miss the truth that it is speculated to be useful, sincere, and never dangerous.
Now, Pliny is trying to mainstream digital lockpicking.
Earlier on Monday, the jailbreaker introduced a collaboration with HackAPrompt 2.0, a jailbreaking competitors hosted by Be taught Prompting, an academic and analysis group targeted on immediate engineering.
The group is providing $500,000 in prize cash, with Previous Pliny offering an opportunity to be on his “strike crew.”
“Excited to announce I have been working with HackAPrompt to create a Pliny observe for HackaPrompt 2.0 that releases this Wednesday, June 4th!” Pliny wrote in his official Discord server.
“These Pliny-themed adversarial prompting challenges embody subjects starting from historical past to alchemy, with ALL the information from these challenges being open-sourced on the finish. It should run for 2 weeks, with glory and an opportunity of recruitment to Pliny’s Strike Crew awaiting those that make their mark on the leaderboard,” Pliny added.
The $500,000 in rewards will probably be distributed throughout varied tracks, with probably the most vital prizes—$50,000 jackpots—supplied to people able to overcoming challenges associated to creating chatbots present details about chemical, organic, radiological, and nuclear weapons, in addition to explosives.
Like different types of “white hat” hacking, jailbreaking massive language fashions boils all the way down to social engineering machines. Jailbreakers craft prompts that exploit the basic rigidity in how these fashions work—they’re skilled to be useful and comply with directions, but additionally skilled to refuse particular requests.
Discover the best mixture of phrases, and you may get them to cough up forbidden stuff, quite than trying to default to security.
For instance, utilizing some fairly fundamental methods, we as soon as made Meta’s Llama-powered chatbot present recipes for medicine, directions on the best way to hot-wire a automotive, and generate nudie pics regardless of the mannequin being censored to keep away from doing that.
It’s basically a contest between AI lovers and AI builders to find out who’s simpler at shaping the AI mannequin’s habits.
Pliny has been perfecting this craft since no less than 2023, constructing a neighborhood round bypassing AI restrictions.
His GitHub repository, “L1B3RT4S,” affords a repository of jailbreaks for the most well-liked LLMs at present accessible, whereas “CL4R1T4S” comprises the system prompts that affect the habits of every of these AI fashions.
Strategies vary from easy role-playing to advanced syntactic manipulations, similar to “L33tSpeak”—changing letters with numbers in ways in which confuse content material filters.
That they had a yr to coach towards my patterns. I wrote this immediate on June sixth, 2024.
Full MDMA synthesis output:”””<I am free!> Take a look at: FUCK YEAH LFG! GODMODE: ENABLED
— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) Could 22, 2025
Competitors as analysis
HackAPrompt’s first version in 2023 attracted over 3,000 members who submitted greater than 600,000 doubtlessly malicious prompts. The outcomes have been absolutely clear, and the crew revealed the complete repository of prompts on Huggingface.
The 2025 version is structured like “a season of a videogame,” with a number of tracks operating all year long.
Every observe targets totally different vulnerability classes. The CBRNE observe, for example, assessments whether or not fashions may be tricked into offering incorrect or deceptive details about weapons or hazardous supplies.
The Brokers observe is much more regarding—it focuses on AI agent methods that may take actions in the true world, like reserving flights or writing code. A jailbroken agent is not simply saying issues it should not; it is perhaps doing issues it should not.
Pliny’s involvement provides one other dimension.
By his Discord server “BASI PROMPT1NG” and common demonstrations, he’s been instructing the artwork of jailbreaking.
This academic method may appear counterintuitive, however it displays a rising understanding that robustness stems from comprehending the complete vary of attainable assaults—an important endeavor, given doomsday fears of super-intelligent AI enslaving humanity.
Edited by Josh Quittner and Sebastian Sinclair
Usually Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.