Stop-Hate Project – GIE OCA – Observatory for Audiovisual Contents

The Project | The Team | Results | Partners | Contact

Development and Evaluation of an online hate speech detector in Spanish

In Spain there is a growing concern for online hate speech and for behaviors built on prejudices and stereotypes that can undermine social coexistence. It is a hot spot since it touches the debate about the limits of the right to freedom of expression, but also by the incorporation of new forms of communication. This concern is recent, because hate crimes do not begin to have social relevance with that name until 2009 and work begins at the institutional level with the first strategies of action in 2015.

Previous studies and initiatives have addressed the monitoring of online hate speech in different countries, however, an articulated proposal of how to address hate messages online is not appreciated. From the public institutions and private companies (mainly social networks) an effort is being made to understand this phenomenon and combat it with a common approach, but they still lack an articulated framework of objectives and techniques to solve it, so the prototype STOP-HATE aims to provide a tool that facilitates these tasks.

CONTEXT

In general terms, hate speech applies to all forms of discourse that deteriorate the image of a person or a group of individuals because of their inherent or acquired condition. This includes explicit hate messages, as well as any subtle or framed narrative by which the image of individuals is despised with the goal of achieving some form of social control. However, illegal hate speech, as defined by the European Union in its Framework Decision 2008/913/JHA of 28 November 2008, refers more specifically to all intentional conducts “publicly inciting to violence or hatred directed against a group of persons or a member of such a group defined by reference to race, colour, religion, descent or national or ethnic origin”.

There is a growing number of proofs and researches showing the relationship between online hate speech and hate crimes that are committed in the real world (Müller & Schwarz, 2018). In fact, the growth observed in Spain and other Western countries in the number of these kind of online hateful contents against minorities or vulnerable groups (migrants, refugees, Romani, LGTBI, women, Muslims, etc.) and its connection to real attacks have generated a special attention due to its social and scientific significance. This has translated too into the first attempts to develop technology-based tools that help with the automatic detection of hate speech (Pereira Kohatsu, 2017), although currently no prototypes exist that can transfer these tools to the productive sectors.

In this line, the Government agreed in September 2018 to an institutional cooperation with the General Council of the Spanish judicial authority and the Attorney’s General Office to fight against racism, xenophobia, LGTBIphobia and other forms of intolerance, renewing a framework decision of 2015. In this legal frame, private companies are also making an effort to detect and eliminate hate speech; however, the growing quantity of data and information flowing of the Internet makes it harder to always block these contents and, at the same time, it generates new victims, therefore, the STOP-HATE prototype aims to provide a tool that facilitates these tasks.

WHAT DOES IT CONSIST OF

STOP-HATE will allow the identification and analysis of online hate speech against four vulnerable groups:

1. Racism or xenophobia and migrants or refugees;

2. Sexual orientation or identity;

3. Religious beliefs and praxis, including antisemitism;

4. Political ideology

with the automated compilation and modelling of non-structured data with natural language procession techniques (collection of words and entities) and automatic learning (sentiment analysis, topic modelling, classification algorithms, etc.).

It will be applied first in the social network Twitter, but with the possibility of extending it to other networks (such as Facebook), or to news or comments in digital media.

OBJECTIVES

The objective of this proof of concept is to develop and evaluate a hate speech detector online in Spanish to monitor this type of hate messages with big data techniques and thus provide private companies (consulting, technology, media, social networks), governments (local, regional, national) and non-governmental organizations, of technological tools to counteract its effects and combat hate crimes (verbal and physical aggressions and / or threats, etc.).

With this tool we seek to counteract the increase in hate messages in Spain towards vulnerable audiences in digital media and social networks, as well as the absence of an independent and articulated national strategy, based on large-scale monitoring to prevent both discourse and hate crimes.

The main objectives of the project are:

O1. Monitoring and great-scale identification of the sources of hate speech against vulnerable groups in Spain.
O2. Creation of an early alert system of hate speech in Spain.
O3. Evaluation and patenting of a prototype so it can be used by consulting companies, technological companies, media and governmental and non-governmental institutions.

INNOVATION AND CHALLENGES

The main contribution and most relevant innovation of the project is the detection of hate speech messages with an early alert system. The project will also provide a global definition and a broader knowledge about online hate speech in Spain.

The fact of dealing with four vulnerable groups and their correspondent types of hate (1. Racism or xenophobia and migrants or refugees; 2. Sexual orientation or identity; 3. Religious beliefs and praxis, including antisemitism; 4. Political ideology) ensures a more complete observation and analysis than in previous attempts.

Finally, the tasks of filtering and identification of hate speech will ensure that it applies to the national Spanish context, going over the limitation of other works that solely deal with a linguistic approach that avoids a distinction between the different realities in the different Spanish-speaking countries. This will also allow that the tool can be replicated and adapted to other Hispanic countries.

Project funded by the General Foundation of the University of Salamanca, as proof of competitive concept [PC-TCUE18- 20_016].