How does Facebook detect hate speech?

Facebook has long struggled with moderating hate speech and other harmful content on its platform. With over 3 billion monthly active users, it can be challenging to monitor all the posts and comments happening across Facebook. Yet failing to limit hate speech can allow real-world harm, so Facebook invests heavily in technology and human reviewers to detect this type of abusive content.

What is considered hate speech on Facebook?

Facebook defines hate speech as “a direct attack against people on the basis of protected characteristics: race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease.” Some specific examples include:

Calling for segregation or exclusion of a group

Claiming a group is physically or mentally inferior
Celebrating or mocking a group’s suffering
Calling for violence against a group

Telling a group to “go back” to where they came from
Dismissing the existence of a group’s identity or experiences

However, context matters when evaluating speech. For example, simply stating a fact like “Women have lower average muscle mass than men” would not qualify as hate speech. The attack must be directly against people. Additionally, humor and social commentary presenting harmful stereotypes would fall under hate speech on Facebook.

How does Facebook find hate speech?

Facebook uses both automated technology and human content reviewers to detect hate speech. Here is an overview of how both methods work:

Automated detection

Machine learning models – Facebook trains machine learning models on hundreds of thousands of example posts to detect patterns in how hate speech is written. The models look at features like word choices, punctuation, and context.
Image recognition – Computer vision technology can scan images and videos for hate symbols, suggestive material, and text in different languages indicating hate speech.

User reports – When users report posts as offensive or violate community standards, it flags them to both machine learning models and human reviewers.
Page monitoring – Facebook proactively monitors the posts, comments, and activity on pages that have a history of policy violations to quickly catch new hate speech.

Human content review

Review queues – Possible hate speech flagged by automated systems and user reports goes into categorized review queues for human evaluation.

24/7 coverage – With reviewers around the world, Facebook can monitor content in over 50 languages, 24 hours a day.
Context rules – Reviewers use detailed rules accounting for context to decide if something qualifies as hate speech or simply social commentary.
Quality control – Oversight mechanisms ensure reviewers stay accurate in judgments over time.

How effective is Facebook at removing hate speech?

By leveraging automation and human effort, Facebook estimates they can now detect around 97% of hate speech content on the platform. However, critics argue that Facebook leaves too much hate speech online and fails to catch harmful posts before they go viral. Here are some key statistics on Facebook’s track record:

Metric	Q3 2021
Amount of hate speech removed	19.2 million pieces
Percent increase over Q2 2021	9% increase
Amount missed by tech and identified by users	5 million pieces
Median time to action when users report	17 hours

While these figures seem high, critics point out that on a platform with billions of users, millions of hate speech posts can still represent less than 1% of all content. Some key challenges Facebook faces in improving their detection include:

Difficulty monitoring private groups and messages

Complexity of speech in many different languages
Hate groups evolving language and symbols to avoid bans
Understaffing for the huge volume of daily content

What techniques do human reviewers use?

Facebook’s content reviewers go through extensive training to accurately identify hate speech and minimize mistakes. Here are some of the top methods they use:

Look for attacks

Statements only qualify as hate speech if they directly attack others based on protected characteristics. Simply mentioning a group would not violate policy. Reviewers closely analyze if posts are intended to belittle, dehumanize, or incite harm against people.

Consider context and intent

Words can have different meanings depending on who says them and why. For example, members of a minority group using racial slurs satirically may be acceptable, while outsiders using them to insult would be hate speech. Reviewers look at the full context of statements.

Check for recidivism

If a user has a history of posting hate speech, new borderline content from them gets interpreted less charitably. However, reviewers never automatically assume repeat violators are guilty.

Get second opinions

Complex cases get escalated to managers and discussed by committee. Having multiple reviewers provide input improves accuracy on hard calls.

Stay up-to-date on issues

Reviewers receive regular training on emerging types of dangerous content, hate trends, coded language by groups, and regional political contexts that influence interpretations.

What tools and datasets are used for training?

For Facebook’s automated hate speech detection to work, the machine learning models require extensive training data. Here is an overview of where Facebook sources training data:

Internal databases – Billions of posts and comments by Facebook users provide examples of real hate speech as well as benign content.
Partner datasets – Facebook uses public academic datasets focused on hate speech detection in different languages.

Expert reviews – Additional training data is created by having experts review samples of non-hate speech to identify any errors.
Evolving data – As new examples of hate speech emerge, they get added to databases to continuously improve models.

In total, most Facebook hate speech detection models train on databases with hundreds of thousands to millions of labeled examples. The diversity, quality, and size of training datasets have a major influence on the accuracy of machine learning models.

How are reviewers screened and tested?

To maintain high standards, Facebook has rigorous processes for selecting content reviewers and auditing their work. Here are some of the key procedures in place:

Application screening

Candidates submit written applications explaining their interest in content moderation. This provides writing samples to assess judgment, analytical skills, cultural awareness, and communication abilities.

Classroom learning

New reviewers complete over 30 hours of instruction on content policies, ethics, bias avoidance, and accuracy. Instruction continues regularly throughout their employment.

Testing and audits

Reviewers must pass exams testing their understanding of policies. Daily audits of their decisions check for errors or inconsistences compared to other reviewers.

Psychological support

Given the challenging nature of the job, Facebook provides access to counselors, relaxation spaces, and wellness resources to support reviewers.

What are the broader challenges in enforcement?

While Facebook’s methods have improved, experts argue there are inherent challenges the company faces in completely stopping hate speech. Some of the key limitations include:

Encryption

End-to-end encrypted messaging apps like WhatsApp prevent Facebook from reading message content to monitor for hate speech.

Bad precedents

Overly aggressive takedowns risk limiting free expression which could encourage censorship in repressive countries.

Algorithms have biases

If not carefully developed, machine learning models can replicate human biases around evaluating content from minority groups.

Whack-a-mole

Banning hate groups just makes them reorganize under different names and symbols in an endless game of “whack-a-mole.”

Defining hate

Policies try to create clear rules but human speech is messy. Borderline examples will always exist in gray areas open to interpretation.

Conclusion

Detecting and removing hate speech poses an immense challenge at Facebook’s scale. The company deploys extensive human and technical resources to identify and enforce against hateful attacks based on ethnicity, race, religion and other protected attributes. Yet flaws and criticisms remain in Facebook’s approach. Eliminating hate speech entirely may prove impossible given the complexity of human language and constant efforts by malicious groups to circumvent bans. By investing in policy, technology, training, and oversight, Facebook must continue working to minimize the amount of hate speech that slips through while avoiding undue censorship.