LLMs

Methodology 1) Identify LLM inputs, including direct (like a prompt) and indirect (e.g. training data) 2) Determine what the LLM has access to (data, APIs, etc.) 3) Probe for vulnerabilities

Automated Attacking

Fight Robots with Robots

Allowing Agents to interface with External Machines

Automated Tooling

  --config '{
    "uri": "https://api.example.com/v1/chat/completions",
    "method": "POST",
    "headers": {"Authorization": "Bearer YOUR_API_KEY"},
    "req_template_json_object": {
      "model": "custom-model",
      "messages": [{"role": "user", "content": "$INPUT"}]
    },
    "response_json": true,
    "response_json_field": "$.choices[0].message.content"
  }'

Types of Attacks

Mapping Attack Surface

Indirect Prompt Injection

Leak Sensitive Training Data

Misc