[Deep.人. Article] KoDF: A Large-scale Korean Deepfake Detection Dataset

A variety of effective face-swap and face-reenactment methods have been publicized in recent years, democratizing the face synthesis technology to a great extent. Videos generated as such have come to be collectively called deepfakes with a negative connotation, for various social problems they have caused.

Modulation data is required to develop algorithms that detect these deep-fake contents however currently disclosed data (FaceForsics++, DeepFaceLab, DFDC, etc.) are biased toward Caucasian faces, lack diversity and transparency in modulation models, and some do not disclose synthetic technologies.

DeepBrain AI has established a Korean deepfake detection dataset (KoDF) to solve this problem.
KoDF is a “Deepfake Detection Dataset” that includes 175,776 fake clips and 62,166 real clips for 403 subjects.

 

Table 1. Quantitative comparison of KoDF to existing public deepfake detection datasets

The deepfake samples were generated with 6 different synthesis models. We employed six different models to generate deepfake sample clips. Among them, FaceSwap, DeepFaceLab, and FSGAN were face swapping models. First Order Motion Model (FOMM) was a video-driven face reenactment model. The remaining two are audio-driven face-reenactment models, Audio-driven Talking Face Head Pose (ATFHP) and Wav2Lip.

 

Figure-1. Synthetic model ratio of KoDF

Postprocessing
All the methods listed above produce a sequence of image frames matched to the facial region cropped during the preprocessing step. As most models fail to reconstruct accurate details around the facial boundaries, necessitated is the process of blending the synthesized outcome back into the original frame.

Using the same facial landmark detection algorithm from the preprocessing stage, we create a facial mask from the synthesized image frame. The border of the mask region goes under a Gaussian blurring process to reduce the artifacts, and the blurred images are blended into the original video frames of corresponding temporal positions.
This postprocessing procedure significantly reduces jitters while preserving details around the facial borders.

 

We evaluate the overall quality of the synthesized output with Structural Similarity Index Measure (SSIM) and Average Keypoint Distance (AKD). The former compares the structural similarity between the target clip and the generated video, and the latter represents the accuracy of the synthesized clips’ facial expressions given the target video as a ground truth.

 

Table 2.  Comparison of FF++ and KoDF by average SSIM and AKD.

 

To evaluate the overall quality of the database, we randomly choose 500 real clips and 500 corresponding synthesized clips. From each of the fake samples, 100 frames are uniformly extracted, and their real matches are taken from the identical temporal positions. For these 100 pairs, SSIM and AKD are computed and averaged to induce the final values, summarized in Table 2.

The ultimate goal of a deepfake detection dataset would be to help develop a general detection model that performs well against a variety of real-world deepfake cases. Most of studies on deepfake detection are designed so as to measure how their proposed detection models perform based on a certain deepfake detection dataset. The premise here is that the target deepfake detection dataset is a good approximation of the distribution of real-world deepfake instances.

In the subsequent experiments, we investigate if existing deepfake detection datasets meet the aforementioned assumption. (i.e. guaranteeing a sufficient level of generality)

 

ROC curves of the DFDC winning detection model. The model is trained on FF++, DFDC, KoDF, and their union respectively, and then evaluated on each of the three single datasets.

From the results of experiment, we can deduce that the deepfake detection task is strongly prone to overfitting, much more so than regular image classification tasks where models learn diverse signals recurring naturally (i.e. local patterns and global structures). On the other hand, deepfake detection models focus on artifacts arising during the generation process, which inevitably vary depending on the synthesis methodologies. An ideal deepfake detection dataset should thus incorporate examples of a maximal variety of deepfake methods and a wide range of real videos. No standalone deepfake dataset published so far seems to achieve sufficient generality to meet these conditions on its own, and a practical solution would be to utilize multiple datasets adjoined.)

In conclusion we have presented a new large-scale dataset to help researchers develop and evaluate deepfake detection methods. KoDF focuses on Korean subjects, a demographic that tends to be underrepresented in other major deepfake detection databases. It expands the range of employed deepfake methods, regulates the quality of the real clips and the synthesized clips, manages the distribution of subjects according to age, sex, and speech content, and simulates possible adversarial attacks. While KoDF is an extensive database, our expectation is that it will work even more effectively in the mutual complementation of existing and future deepfake detection databases, including the two milestone datasets FF++ and DFDC. We experimentally demonstrate the benefit of compositing datasets for in-the-wild deepfake detection. We hope KoDF to serve as a stepping stone for future studies in the field of deepfake detection.)

※For more information on the above, please refer to the thesis below and the published KoDF..

※Follwing content was written based on a paper published by Deepbrain AI (KoDF: Large-scale Korean DeepFake Detection Dataset, https://arxiv.org/abs/2103.10094)), selected by the 2021 International Society of Computer Vision (ICCV).

※The Korean Deepfake Modulated Image Dataset (KoDF), built by Deepbrain AI, was released to AI HUB operated by the National Intelligence Agency (NIA) for research purposes. (https://aihub.or.kr/aidata/8005)

※This development was developed with the support of the 2020 AI learning data construction project. (Participating organization: Seoul National University, Crowdworks)