Definition
Adversarial machine learning is a field of study that focuses on understanding and defending against vulnerabilities and risks in machine learning.
Usually, machine learning models are trained on accurate data that reflect real-world situations. However, adversarial machine learning researchers explore how attackers can trick the model with custom models.
History of Adversarial Attacks
- Early discoveries (2000s): Researchers discovered vulnerabilities in machine learning models and realized that attackers can create inputs to manipulate classifiers, such as decision trees and vector machines.
- Exploration of Adversarial Examples (2013): Researchers from the University of Wyoming devised the term ‘adversarial example’ after demonstrating that slight, nearly invisible changes in input data could cause deep neural networks to misclassify objects, highlighting the need for defenses against such attacks.
- Breakthrough in Deep Learning (2014): Deep learning models, intense neural networks, became widely popular for delivering impressive performance in diverse applications, such as speech and image recognition. However, these advanced models were found to be susceptible to adversarial attacks.
- Rise in Adversarial Research (2016-2018): During this time, adversarial machine learning attracted considerable attention in industry and academia. Researchers began publishing papers on adversarial attacks, defensive strategies, and their effect on various machine learning algorithms.
- Real-World Impact (2018-present): Adversarial attacks have advanced from theoretical examples to significant consequences, specifically in computer vision and autonomous systems fields. For instance, studies show that in self-driving vehicles, object identification systems are attacked to mislead stop signs.
- Ongoing Development of Defense Technique (ongoing): As adversarial attacks continue, researchers and practitioners are creating defense strategies to enhance the resistance of machine learning models through various methods.
Types of Adversarial Attacks
- White-box attacks: Attackers have full access to model details, enabling accurate input manipulation for miscalculation.
- Evasion attacks: Attackers modify test inputs slightly to trick the model during inference.
- Poisoning attacks: Attackers corrupt training data intentionally to mislead model learning.