Deepfake Text Detection: Limitations and Opportunities
Author: Zain Sarwar (former UG student and Research Assistant at SIA LUMS, currently a Ph.D. student at the University of Chicago)
Recent advances in deep learning have enabled language models to produce high-quality text which is often indistinguishable from human-written text. While this reflects significant progress in generative modeling, these technologies can be misused by bad actors in a variety of ways such as generating misinformation and fake news, compromising the integrity of online review systems, launching phishing campaigns, etc. Therefore, it is imperative that we develop tools that can distinguish between human and AI-written text.
While researchers have developed several tools for this purpose, their robustness and performance in a practical setting were unknown. Thus, the goal of our research was to find out how well these AI-text detection schemes would work in a realistic setting. We accomplished our goals in two ways. Firstly, since the existing tools were only evaluated on datasets curated by the researchers themselves, we collected datasets from online services providing AI-written text. Secondly, we developed attacks against these tools to understand how robust they were in the face of an attacker deliberately generating AI-written text which would evade detection from a tool.
Our experiments revealed several insights into the performance of these tools. Firstly, we observed that most tools did not perform well when evaluated on AI written text from unknown generation techniques nor were they able to withstand an adaptive adversary. This was because these tools were trained to recognize low-level statistical artifacts present in their training data which could easily be changed to evade detection. On the other hand, we observed that one detection tool, which relied on analyzing higher-level semantic features in a piece of text to classify it as human or AI-written, generalized significantly better and withstood several attacks which we had developed. We investigated this tool in-depth and suggested several ways to improve the detection of AI-written text.
This blog post summarizes our research paper “Deepfake Text Detection: Limitations and Opportunities“, which was presented at the IEEE Symposium on Security and Privacy in May 2023.