Since writing this post, I have had many great feature suggestions for the Profanity Detector. I have implemented all the suggestions I have received and written another blog post about it. You can read the 2nd post here.
On several projects that I have worked on, we have had a requirement to detect profanity in users input. This includes things like general swear words, sexual acts, racial slurs, and sexist slurs, etc. Over the years, I have built a pretty comprehensive list of these profanities used for the detection process. The list has been built from combining lists I found on the internet. The lists are allegedly used by a lot of the large social networks in their profanity detection; although I can’t verify that.
My profanity detector is on GitHub, and released under an MIT license, so it is free for anyone to use and modify. The main list of profanities can be found in the ProfanityList.cs file. If you are easily offended and a bit sensitive to language then I recommend you DO NOT open that file. It contains some pretty gross language, but to detect the language, you need to be able to define it.
The library is built to .NET Standard 2.1, so it can be used in any .NET / .NET Core project. You can lower the .NET Standard version number if required for an older project; it will still work perfectly well.
There are three methods in the ProfanityFilter class.
IsProfanity() will return true if the passed in a string is considered profanity and false otherwise.
CensorString(), will return a version of an input string with any profanities censored out with the ‘*’ character.
DetectAllProfanities() will return a list of all profanities detected in a string.
Here are some example usages.
// Return true if a bad word var filter = new ProfanityFilter(); Assert.IsTrue(filter.IsProfanity("arsehole")); // Return false if NOT a naughty word var filter = new ProfanityFilter(); Assert.IsFalse(filter.IsProfanity("fluffy"));
In this example, we pass in a string “Mary had a little [email protected] lamb who was a little [email protected]”. The method CensorString will return the input string with the words [email protected] and [email protected] removed.
var filter = new ProfanityFilter(); var censored = filter.CensorString("Mary had a little [email protected] lamb who was a little [email protected]"); var result = "Mary had a little **** lamb who was a little ******."; Assert.AreEqual(censored, result);
In this final example we pass the string “2 girls 1 cup is my favourite twitting video” into the method DetectAllProfanities. This will return a ReadOnlyCollection of profanity. In this case, it returns the words, “twatting”, and “2 girls 1 cup” (if you don’t know what this is, DON’T look it up).
For future upgrades I want to be able to classify each profanity into different groups, such as swear words, sexual acts, racial slurs, sexist remarks etc. Implementation wise this is very easy to do, the difficulty comes from the fact I only recognise less than a quarter of the terms in this list. To classify them, I would need to look them all up, and I am not sure I want to do that. For example, an Alabama HotPocket, sounds like some kind of cake. It’s NOT! Maybe some debauched developers can help with this classification. We’ll see.
If you need to detect bad language in user input, then hopefully this library will be of use to use. If I am honest, there are probably more efficient ways of implementing it, but this has worked out perfectly well for me in the past. I hope you find it useful.