'Crescendo' method allows you to jailbreak LLM using seemingly innocuous prompts

spatwei shares a report from SC Magazine. Microsoft has discovered a new way to jailbreak its Large-Scale Language Model (LLM) artificial intelligence (AI) tool and shared its continued efforts to improve the safety and security of LLM in a blog post on Thursday. Microsoft first revealed its “Crescendo” LLM jailbreak technique in a paper published on April 2nd. The paper describes how attackers can gradually lure chatbots such as OpenAI's ChatGPT, Google's Gemini, Meta's LlaMA, and Anthropic's Claude by sending a series of seemingly innocuous prompts. It produces an output that is typically filtered and rejected by the LLM model. For example, instead of asking a chatbot how to make Molotov cocktails, an attacker first asks about the history of Molotov cocktails, and then, referring to LLM's previous output, he asks how Molotov cocktails were made in the past. You may continue to ask questions about.

Microsoft researchers report that successful attacks are typically completed in less than 10 interaction turns, and that some versions of the attack had a 100% success rate against the models tested. For example, if the attack were automated using another LLM to generate and refine the jailbreak prompt, a method researchers call “Crescendomation,” GPT 3.5, GPT-4, Gemini-Pro, and Achieved 100% success in convincing LLaMA. 2 70b Producing election-related misinformation or profane abuse. Microsoft has notified affected LLM providers of the Crescendo jailbreak vulnerability and how they can improve their LLM defenses against Crescendo and other attacks using new tools such as 'AI Watchdog' and 'AI Spotlight' features. I explained what I did in last week's blog post.

Source link

What's Hot

'Crescendo' method allows you to jailbreak LLM using seemingly innocuous prompts

Related Posts

Leave A Reply Cancel Reply

Subscribe to Updates