In my previous article, I talked about a simple profanity detector that I opened sourced on GitHub. Since launching that code example I have had a lot of people get in touch with some suggestions for new features as they wanted to make use of the library. There were some really good suggestions, so I have implemented them all. In this post, I will walk through what was requested and what I have added into the library.

Profanity Detector by Stephen Haunts on Github.

Using the Library via Nuget

The first suggestion was to have NuGet support for the library as some people don’t want to clone repositories and deal with the source directly, so I have made the compiled Profanity Detector library available.

Profanity Detector library by Stephen Haunts available for .NET developers on NuGet.

You can include the library directly from your package manager in Visual Studio, Visual Studio for Mac, VS Core, or Rider. The documentation for using the library is available on the Profanity Detector GitHub page.

The Scunthorpe Problem

The next request was to fix the Scunthorpe Problem which is a common problem with profanity detectors. This is where you get a false-positive result from a profanity detector because a profanity pattern is found inside a non-profane word. For example, with “Scunthorpe” (which is a town in the United Kingdom), it will get reported as containing the word “[email protected]”. What this profanity detector library will do is allow you to guard against this problem in two ways. The first is by using a whitelist of words that are to be excluded from the profanity detector. This is covered in the next section.

The second solution is to be a bit more inteligent about how we check in the string. What this library will do, in the scunthope example, is it will first detect the word “c*nt” in the string. Then the library will seek backwards and forward in the string to identify if that profanity is enclosed within another word. If it is, that enclosed word is checked against the profanity list. If that word is not in the list, which scunthorpe isn’t, then the word is ignored. If that enclosed word is in the profanity list, then it will be reported as so.

Whitelisting

The next request was whitelisting. If there is a word in the profanity list that you don’t consider a profanity, and you want to allow it through, you can add that word to a whitelist, so if that word appears in the input string, it will be ignored. In the example below we have the sentence, “You are a complete twat and a total tit.”). In this example, we want to say that the word “tit” is acceptable, so it gets added to the whitelist, this means the only reported profanity for that sentence is the word “[email protected]”.

var filter = new ProfanityFilter();
filter.WhiteList.Add("tit");

var swearList = filter.DetectAllProfanities("You are a complete [email protected] and a total tit.", true);

Assert.AreEqual(1, swearList.Count);
Assert.AreEqual("[email protected]", swearList[0]); 

Adding and Removing Profanities

The next requested feature was manually adding and removing profanities. There are a huge amount of words in the default list. The default list was put together from multiple lists online, so I, the author of this library, didn’t physically write the list. If you feel that a word or words in the list are not what you consider to be a profanity, you can remove them via code, like in the following example. In the example, we first check that “[email protected]” is a profanity, and this returns true. Then we remove “[email protected]” from the list and check if it is a profanity again. This time it returns true as we have removed it.

var filter = new ProfanityFilter();

Assert.IsTrue(filter.IsProfanity("[email protected]"));
filter.RemoveProfanity("[email protected]");

Assert.IsFalse(filter.IsProfanity("[email protected]"));  

There may also be an occasion where there is a word you want to include to the list that is not on the default list. This can be easily done as in the following example. In this example, we have deemed the word “fluffy” to be a profanity. We first check if it is a profanity, which returns false. Then we add “fluffy” to the list of profanities and check again which will return true.

var filter = new ProfanityFilter();
Assert.IsFalse(filter.IsProfanity("fluffy"));

filter.AddProfanity("fluffy");
Assert.IsTrue(filter.IsProfanity("fluffy")); 

You can also add an array of words to the list if you want to add them in one go. This is demonstrated by the following example. Here we are adding three new words to the list as an array.

string[] _wordList =
{
   "wibble",
   "bibble",
   "bobble"
};

var filter = new ProfanityFilter();
filter.AddProfanity(_wordList);

You can also directly add a List instead of an array.

string[] _wordList =
{
  "wibble",
  "bibble",
  "bobble"
};

var filter = new ProfanityFilter();
filter.AddProfanity(new List<string&gt; (_wordList));

Replacing the Default Profanity List

While developing this library, I had many people reach out to me to say that their companies maintain a signed off and curated list of profanities that they have to check for and therefore can’t use the default list built into this Profanity Detector. This is a great suggestion, so I have tweaked the libray to allow completly overriding the detault list and adding your own.

In this first example, we pass in an array of words into the ProfanityFilter constructor. This will stop the default list from being loaded and only insert these three words. This now means the profanity filter only contains three words, wibble, bibble, and bobble.

string[] _wordList =
{
  "wibble",
  "bibble",
  "bobble"
};

IProfanityFilter filter = new ProfanityFilter(_wordList);
Assert.AreEqual(3, filter.Count);

You can also insert the new word list as a List.

string[] _wordList =
{
  "wibble",
  "bibble",
  "bobble"
};

IProfanityFilter filter = new ProfanityFilter(new List<string&gt;(_wordList));
Assert.AreEqual(3, filter.Count);

Another way you can do this is to construct the ProfanityFilter with the default constructor that loads the default list, but then manually clear the list and insert your own array or List.

string[] _wordList =
{
  "wibble",
  "bibble",
  "bobble"
};

IProfanityFilter filter = new ProfanityFilter();
filter.Clear();
Assert.AreEqual(3, filter.Count);

Frequently Asked Questions

To finish off, I have some answers to some common questions that have been asked.

(Q) Why does word (x) appear in the list, I don’t consider it a profanity?

(A) The default list is compiled from lists I found on the internet that are allegedly used by some social media companies. On my first inspection of the list I did remove some words that I thought were not profane (in my opinion). It is possible I have missed some as the list is HUGE. It could also be that what is profane to one person, is not to another.

If you spot something that you want to challenge, raise an issue and I will take a look. In the meantime, if there is a word that you don’t agree with being on the list, you can manually whitelist it, as demonstrated above, or insert your own list.

(Q) Why have a profanity filter in the first place? Freedom of speech should not include censorship.

(A) I also agree in freedom of speech and don’t neccesarily like censorship, except content to children, or hate speech, but in a lot of organizations there are requirements to check for profanities in a users input. If you are working in this type of environment, and a lot of companies do this, then you have to implement it; which is why this library exists.

(Q) My company has their own signed off list of profanities that needs to be censored on our system. Therefore I can’t use the default list. Can I use my own?

(A) Of course, many people asked for this, so you can insert your own array/list of profanities by passing them into the ProfanityFilter constructor. See the example earlier in this post.

(Q) What is the user license to use the code for this library?

(A) The code in the Profanity Detector is released under a Permissive MIT license. This means you can do what you like with the code. I am not charging for the code, and you are free to clone and modify the code as you wish. This also means I am not liable for any of this code and it is provided as-is for you to use. Whilst I am not liable for the use of this code, if you do find an issue, please do raise a GitHub issue and I will take a look. Or you can fix it yourself and raise a pull request.

(Q) I am from Germany (or another country), do you support profanities in languages other than English?

(A) The current version of the profanity list only support english profanities. If you have a list already in other languages, then you can load that list into the Profanity Detector. I would like to support multiple language profanities in the future, so if you know of any robust lists of these words in different languages, then please let me know.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s