Digital Peace Talks Interviews (No. II)

4 min readApr 14, 2020

Series of expert interviews to evaluate the concept of DPT

Lucas Dixon

Lucas is an expert in text classification. He is interested in technology that enables good discussions at large. This is his private opinion.

Lucas, you looked into classifying the toxicity of comments. Which up- and downsides did you find regarding the use of text classification for enabling healthy, large scale online discussions?”

It really depends how you use text classifiers. They generally don’t work very well as a black-box rejection filter for comments — like black-word lists, they will have unintended biases on the comments being filtered. Also people get frustrated when their contribution is wrongly rejected, especially if they can’t appeal.

The uses I’ve seen that I think are most effective are:

to assist human moderators by, for example, highlighting the parts of a comment that might not meet the community guidelines (see the NYT use-case for example); and by sorting comments by model-thinks-is-worst first for the human review team. This can speed up the human review team’s effectiveness.
By providing a user-nudge when the user enters a comment that looks like it might not match the community guidelines (see for example the Coral Project’s Talk Tool). The wording of the nudge matters a lot here: telling people they are toxic doesn’t help; but pointing them to the community guidelines, and giving them a chance to rewrite their comment has been very helpful in the cases I’ve looked at.

Diverse viewpoints are, among others, a necessity for healthy discourse. Which promising technical approaches do you see or know of to increase the diversity of arguments in a discussion?

Less toxic conversations help people to express a wider range of views without fear of being attacked.

Machines can help us see when what we say might be perceived as toxic by others. By giving us the chance to reflect, we can make our communication more inclusive and thus help the communities we communicate in also be more open minded. There are two common hypotheses about who writes toxic language, and what can be done about it.

One is that it’s a small number of toxic users, and they can only be stopped by blocking them. I call this the troll hypothesis.
The other is that toxic language is the result of many people having a bad day, and sometimes they write something that, upon reflection, they later wish they had not. I call this the bad-day hypothesis.

Research suggests that there’s some truth in both of these hypotheses, but that the majority of toxic language is from people who are often not toxic (the bad-day hypothesis). The good news here is that it means that we can have a big impact on improving online discourse by just helping people reflect on what they write! This won’t help the abuse from trolls, but can still have a big effect on the majority of toxic language that is pushing people out of conversation.

When we talked about toxicity, Thomax from the DPT community quoted Heinz von Förster: “The recipient determines the meaning of the message”. While there is probably consensus about the majority of classifications as toxic language, which approaches or solutions do you see regarding specific terms that are perceived as toxic “only” by a minority or, to hyperbolize, even just one person?

Broadly speaking, there are 3 categories of tools that can be built for moderation: tools for authors, tools for moderators and tools for readers/viewers.

Tools for readers, like Tune, provide some functionality to asses the content they read. However, the social challenges around toxic language that effects smaller groups, or is understood only by a smaller group, is an open and difficult challenge.

Advances in few-shot learning might make automated detection possible with much less data. Still : how do you turn that into a practical products and tools? Many solutions end up being a frustrating tradeoff of upsetting some users or upsetting others.

I suspect most progress can be made by focusing on tools to help have good conversations, and really understand each other, instead of tools that try to filter the content.

With the Digital Peace Talks, we are aiming at building a graph that shows how both sides perceived one-on-one communications. Could such data be used to train a neural network to predict how the author of text A will perceive text B by another person? If so, could you elaborate shortly on the technical approaches you would try?

Labelling specific communications in terms of how they are perceived is exactly the kind of data that is used to train today’s toxicity classifiers. If such data was made available under a creative commons license, it would certainly help training.

In terms of how to improve such processes: Being able to tag things ideally in several ways can train classifiers to make better text understanding tools for the more difficult nuances of disrespectful language.

Thanks for this interview Lucas!

Digital Peace Talks Interviews (No. II)

Lucas Dixon

Written by Iwan Ittermann