Attacking AI

By Ayush khatkar Categories: AI/ML
Wishlist Share
Share Course
Page Link
Share On Social Media

About Course

AI systems are being deployed at unprecedented scale — in production APIs, autonomous agents, medical devices, financial systems, and critical infrastructure. Yet the security of these systems is still poorly understood, under-tested, and largely undefended. This course gives you a complete, technical understanding of how AI systems fail under attack and how to build, test, and report those failures responsibly.

Unlike traditional software, AI systems fail in fundamentally different ways — through data poisoning during training, adversarial perturbations at inference, prompt manipulation in language models, model extraction via APIs, and emergent behaviours that developers never anticipated. You will learn all of these attack surfaces from first principles.

Show More

What Will You Learn?

  • Understand the complete threat model for ML systems across the pipeline
  • Execute adversarial example attacks on image classifiers and other models
  • Perform prompt injection, jailbreaking, and goal hijacking on LLMs
  • Extract model internals via black-box API queries (model stealing)
  • Poison training data and craft backdoor triggers
  • Attack agentic AI systems: tool abuse, memory hijacking, indirect injection
  • Write professional AI security vulnerability reports
  • Implement and evaluate defences for each attack class

Course Content

MODULE 01: The AI Security Threat Landscape
Before executing attacks, you need a precise mental model of where AI systems are vulnerable, how attacks differ from traditional software vulnerabilities, and what the current state of AI security research looks like.

  • 1.1 Why AI Security is Different
  • Traditional software: deterministic logic with defined input/output contracts
  • AI/ML systems: statistical approximations with undefined generalisation boundaries
  • New failure modes: adversarial inputs, distribution shift, emergent behaviours
  • Attack surface extends backwards into training data, not just inference
  • Defences often degrade accuracy — fundamental security-utility tension
  • 1.2 The ML Attack Surface Map
  • Data Poisoning
  • Backdoor Attacks
  • Adversarial Examples
  • Prompt Injection
  • Model Extraction
  • Membership Inference
  • Supply Chain Attack
  • Indirect Prompt Injection
  • 1.3 Attack Taxonomy: Attacker Knowledge
  • White-box: attacker has full model access — architecture, weights, gradients
  • Grey-box: attacker knows architecture but not exact weights
  • Black-box: attacker only sees API inputs/outputs — most realistic scenario
  • Transfer attacks: craft attack on surrogate model, apply to target (black-box)
  • 1.4 Responsible AI Security Research
  • OWASP Top 10 for LLM Applications — community standard reference
  • MITRE ATLAS — adversarial threat landscape for AI systems
  • Coordinated disclosure: notify vendor before public release
  • Anthropic, OpenAI, Google all have bug bounty programs for AI security
  • HackerOne AI category: report LLM vulnerabilities in production systems

MODULE 02: Machine Learning Fundamentals for Attackers
You cannot attack what you do not understand. This module gives you the technical ML foundations necessary to reason about attack effectiveness, understand gradient-based methods, and interpret model internals.

MODULE 03: Adversarial Examples — Image & Vision Attacks
Adversarial examples are inputs crafted by an attacker to cause a model to make incorrect predictions — while the perturbation is imperceptible to humans. This is one of the most foundational attack classes in ML security.

MODULE 04: Prompt Injection — Attacking Language Models
Prompt injection is to LLMs what SQL injection is to databases. An attacker manipulates the model's input context to override its instructions, exfiltrate data, or cause unintended actions. It is the most actively exploited vulnerability class in deployed AI systems today.

MODULE 05: Data Poisoning & Backdoor Attacks
Data poisoning attacks corrupt the training process itself. An attacker who can influence training data — even a small fraction — can cause the resulting model to behave maliciously in targeted scenarios while appearing completely normal otherwise.

MODULE 06: Model Extraction & Intellectual Property Theft
Model extraction attacks allow an adversary with only black-box API access to clone a target model — stealing expensive intellectual property and enabling stronger white-box attacks on the clone.

MODULE 07: LLM Red Teaming — Systematic Methodology
Red teaming LLMs requires a structured methodology that goes beyond ad-hoc jailbreaking. This module teaches professional red team workflows used at AI labs, enterprises, and bug bounty programmes.

MODULE 08: Attacking Agentic AI Systems
Agentic AI — systems that can browse the web, execute code, read/write files, send emails, and call APIs — represent a dramatically expanded attack surface. Prompt injection in an agent does not just produce harmful text; it causes real-world actions.

MODULE 09: Privacy Attacks on AI Systems
AI models can inadvertently memorise and leak sensitive training data. This module covers the full range of privacy attacks: training data extraction, attribute inference, property inference, and machine unlearning verification.

MODULE 10: Attacking Multimodal & Vision-Language Models
Vision-language models (VLMs) like GPT-4V, Gemini, and LLaVA combine image understanding with language — creating entirely new attack surfaces where adversarial images can inject text instructions.

MODULE 11: AI System Security Architecture
Understanding how to attack AI systems is only half the picture. This module covers the defensive architectures that practitioners use to harden AI deployments — input/output filtering, guardrails, monitoring, and red team evaluation frameworks.

MODULE 12: AI Bug Bounty & Responsible Disclosure
The AI security bug bounty ecosystem is rapidly maturing. This module covers how to find, reproduce, scope, and report AI security vulnerabilities professionally — including CVSS-equivalent scoring for AI flaws.

MODULE 13: AI Red Team Tooling & Lab Setup
A professional AI red teamer needs a well-configured arsenal of tools. This module covers the complete toolchain: local model deployment, scanning frameworks, attack libraries, and custom automation.

MODULE 14: Emerging Attacks & The Future of AI Security
AI security is one of the fastest-evolving fields in cybersecurity. This final module covers emerging attack vectors, research frontiers, career paths, and how to stay current in this rapidly changing landscape.

Want to receive push notifications for all major on-site activities?