Towards Adversarially Robust Deepfake Detection: An Ensemble Approach


Detecting deepfakes remains an open problem. Recent detection methods fail against an adversary who adds imperceptible adversarial perturbations to the deepfake to evade detection. We propose Disjoint Deepfake Detection (D3), the first adversarially robust deepfake detector to the best of our knowledge. D3 uses an ensemble of models over disjoint subsets of the frequency spectrum to significantly improve robustness beyond de facto solutions such as adversarial training. Our key insight is to leverage a redundancy in the frequency domain and apply a saliency partitioning technique to disjointly distribute individual frequency components across multiple models. We formally prove that these disjoint ensembles lead to a reduction in the dimensionality of the input subspace in which the adversarial deepfakes lie. We then empirically validate the D3 method against white-box attacks and black-box attacks, and find that D3 significantly outperforms existing state-of-the-art ensemble defenses in deepfake detection against an adaptive adversary.

ArXiv Pre-Print