Open Source Dataset for Sensitive Classification for Radicalization Detection

in News
Share this publication

The Counter Project consortium announced the release of the Counter DataSet, an open-source pseudonymized dataset aimed at facilitating research on radicalization detection. The dataset, titled “Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks,” is now available HERE

Developed by researchers Arij Riabi, Menel Mahamdi, Virginie Mouilleron, and Djamé Seddah, the Counter DataSet offers a collection of annotated data for sensitive classification tasks, with a specific focus on Named Entity Recognition (NER) for radicalization detection. The dataset includes examples across multiple languages and is designed to support the development of more robust and privacy-preserving machine learning models.

The dataset contains content that may be considered offensive, including racist, sexist, homophobic, and other discriminatory material. Researchers interested in accessing the training and test sets must complete a form on the project’s official website. Following submission, instructions for downloading the data will be provided via email.

This initiative is part of a broader effort funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101021607. The project underscores the importance of balancing the need for sensitive data in research with the imperative of protecting individual privacy through pseudonymization.

For more information, please visit the Counter Project’s official website or contact the dataset maintainers via email at djame.seddah@inria.fr or arijriabi96@gmail.com.

 

 

 

 

 

Previous Post
CounteR’s Final Newsletter: Goodbye but Not Farewell
Next Post
Innovations in Network Embedding at the International Complex Networks Conference
You may also be interested in these topics
Skip to content