The massive spread of social networks provided a plethora of new possibilities to communicate and interact worldwide. On the other hand, they introduced some negative phenomena related to social media addictions, as well as additional tools for cyberbullying and cyberterrorism activities. Therefore, monitoring operations on the posted contents and on the users behavior has become essential to guarantee a safe and correct use of the network. This task is even more challenging in presence of borderline users, namely users who appear risky according to their posts, but not according to other perspectives.
In this context, this paper contributes towards an automated identification of risky users in social networks. Specifically, we propose a novel system, called SAIRUS, that solves node classification tasks in social networks by exploiting and combining the information conveyed by three different perspectives: the semantics of the textual content generated by users, the network of user relationships, and the users’ spatial closeness, derived from the geo-tagging data associated with the posted contents. Contrary to existing approaches that typically inject features built from one perspective into the other, we learn three separate models that exploit the peculiarity of each kind of data, and then learn a model to fuse their contribution using a stacked generalization approach.
Our extensive experimental evaluation, performed on two variants of a real-world Twitter dataset, revealed the superiority of the proposed method, in comparison with 15 competitors based on one of the considered perspectives alone, or on a combination thereof. Such a superiority is also clear when specifically focusing on borderline users, confirming the applicability of SAIRUS in real-world social networks, which are potentially affected by noisy data.