Towards a Robust Detection of Language Model Generated Text: Is ChatGPT that Easy to Detect?

in Scientific publications
Share this publication
Author Wissam Antoun, Virginie Mouilleron, Benoît Sagot, Djamé Seddah
Title of Journal, Proc. or Book 30e Conférence sur le Traitement Automatique des Langues Naturelles
Issue June 2023
Repository link https://hal.science/hal-04130146/
Peer reviewed Yes
Open access Yes
Abstract

Recent advances in natural language processing (NLP) have led to the development of large language models (LLMs) such as ChatGPT. This paper proposes a methodology for developing and evaluating ChatGPT detectors for French text, with a focus on investigating their robustness on out-of-domain data and against common attack schemes. The proposed method involves translating an English dataset into French and training a classifier on the translated data. Results show that the detectors can effectively detect ChatGPT-generated text, with a degree of robustness against basic attack techniques in in-domain settings. However, vulnerabilities are evident in out-of-domain contexts, highlighting the challenge of detecting adversarial text. The study emphasizes caution when applying in-domain testing results to a wider variety of content. We provide our translated datasets and models as open-source resources.

Previous Post
Enriching the NArabizi Treebank: A Multifaceted Approach to Supporting an Under-Resourced Language
Next Post
Data-Efficient French Language Modeling with CamemBERTa
You may also be interested in these topics
Skip to content