‘Many-shot jailbreaking’: AI lab describes how tools’ safety features can be bypassed

Beoinfo Wednesday, April 03, 2024

Paper by Anthropic outlines how LLMs can be forced to generate responses to potentially harmful requests

The safety features on some of the most powerful AI tools that stop them being used for cybercrime or terrorism can be bypassed simply by flooding them with examples of wrongdoing, research shows.

In a paper from the AI lab Anthropic, which produces the large language model (LLM) behind the ChatGPT rival Claude, researchers described an attack they called “many-shot jailbreaking”. It is as simple as it is effective.

Continue reading...

https://ift.tt/LI3y1WK

Originally posted in the guardian.

Subscribe Us

‘Many-shot jailbreaking’: AI lab describes how tools’ safety features can be bypassed

Categories

Recent Posts

Hot Topic

Blog Archive

Labels

Translate

Comment

Flickr Images

Popular Posts

Popular Posts

Oi Vay: the German remote-driving company that hopes to make private car ownership redundant

When maps go wrong: from the Great North Run to a phantom Aldi

Inside the Jaguar Land Rover hack: stalled smart factories, outsourced cybersecurity and supply chain woes

Menu Footer Widget

Subscribe Us

‘Many-shot jailbreaking’: AI lab describes how tools’ safety features can be bypassed

You may like these posts

Categories

Recent Posts

Hot Topic

Blog Archive

Labels

Translate

Comment

Flickr Images

Popular Posts

Social Plugin

Popular Posts

Oi Vay: the German remote-driving company that hopes to make private car ownership redundant

When maps go wrong: from the Great North Run to a phantom Aldi

Inside the Jaguar Land Rover hack: stalled smart factories, outsourced cybersecurity and supply chain woes

Menu Footer Widget