Towards Adversarially Robust Deepfake Detection: An Ensemble Approach


Detecting deepfakes remains an open problem. Recent detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deep- fake to evade detection. We propose Disjoint Deepfake Detection (D3), the first adversarially robust deepfake detector to the best of our knowledge. D3 uses an ensemble of models over disjoint subsets of the frequency spectrum to signifi- cantly improve robustness beyond de facto solutions such as adversarial training. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute individual frequency com- ponents across multiple models. We formally prove that these disjoint ensem- bles lead to a reduction in the dimensionality of the input subspace in which the adversarial deepfakes lie. We then empirically validate the D3 method against white-box attacks and black-box attacks, and find that D3 significantly outper- forms existing state-of-the-art ensemble defenses in deepfake detection against an adaptive adversary.

ArXiv Pre-Print