Twitter Pressures Users to Self-Censor 'Offensive' Replies to Tweets

The Twitter team behind new "prompts" to users over replies that its artificial intelligence moderator has deemed offensive said they were so effective that it caused a third of users to either edit or delete their posts.

Twitter has introduced a new feature to pressure users to self-censor their replies to posts on the social media network.

In a blog post titled: "tweeting with consideration", the Twitter product design director, Anita Butler, and a product manager, Alberto Parrella, said the new warning messages to users before tweeting their replies would be put into force beginning Wednesday — targeting English-language accounts first.

"People come to Twitter to talk about what's happening, and sometimes conversations about things we care about can get intense and people say things in the moment they might regret later," they wrote.

Twitter began testing "prompts" last year to "encourage" users to "pause and reconsider a potentially harmful or offensive reply — such as insults, strong language, or hateful remarks — before Tweeting it."

Butler and Parrella said the artificial intelligence algorithms at first flagged posts unnecessarily because they "often didn't differentiate between potentially offensive language, sarcasm, and friendly banter."

The effect on users was striking. The team found that 34 percent of users either edited their replies or didn't send them at all in response to the warning messages. And after their first warning, users afterword wrote 11 percent fewer "offensive" replies.

The Twitter employees claimed that those who self-censor were less likely to get "offensive and harmful" replies in return.

The pair said Twitter was refining its system to monitor users' relationships with each other, based on how much they interact with each other, to fine-tune the trigger threshold.

It has also added adjustments to try and recognise "situations in which language may be reclaimed by underrepresented communities and used in non-harmful ways" — possibly meaning the use by ethnic and other minorities of pejorative terms for by themselves.

Fellow Big Tech social media giant Facebook has a similar feature in place. It hauls up users when they click 'post' on a comment that its automated monitoring system flags as potentially against "community standards". The poster is given three options: "Edit Comment,” “Ignore” or “Delete Comment”.

The Former US president, Donald Trump, who was banned by both Twitter and Facebook earlier this year, threatened to remove the 'Section 230' protections enjoyed by social networking sites against being sued for libel posted by users — arguing their censorship rules made them more akin to edited publications than open-access platforms.