Detecting Deep Learning Backdoors: The DeBackdoor Approach

Researchers from Qatar Computing Research Institute and Mohamed bin Zayed University developed DeBackdoor, a framework to detect hidden backdoor attacks in deep learning models used in critical systems like self-driving cars and medical devices.

All about DeBackdoor

In many cases, developers acquire deep learning models from third-party sources without access to training data or the ability to inspect the model’s internals, making backdoor detection difficult. Most existing methods require access to the model’s architecture, training data, or multiple instances.

DeBackdoor overcomes these challenges by using a deductive approach to generate potential triggers and a search technique to identify the most effective ones. It focuses on optimizing the Attack Success Rate (ASR), a key metric for evaluating the success of backdoor attacks.

How it detects

DeBackdoor’s detection methodology involves defining a search space for potential trigger templates based on the attack’s description. It then applies Simulated Annealing (SA), a stochastic search technique, to iteratively generate and test candidate triggers.

SA is chosen for its ability to avoid local minima, allowing for a more thorough exploration of the trigger space compared to simpler methods like Hill Climbing.

By applying these triggers to a small set of clean inputs and assessing the model’s responses, DeBackdoor can identify if the model is backdoored.

The framework has shown high detection performance across various attack scenarios, including different trigger types and label strategies such as All2One, All2All, and One2One.

DeBackdoor outperforms existing detection baselines like AEVA and B3D, which are limited in scope and effectiveness.

Its adaptability makes it especially valuable in situations where the attack strategy is unknown or varies, offering a strong solution for securing deep learning models in critical applications.