If you're set on using pluralization, you will have to use the PluralizationService
(see this answer for more details).
And seeing that you're using a string.Format
, I assume you're looping your backlist array.
So why not do it all in a neat method?
public static string GetBlacklistRegexString(string[] blacklist)
{
//It seems that this service only support engligh natively, to check later
var ps = PluralizationService.CreateService(CultureInfo.GetCultureInfo("en"));
//Using a StringBuilder for ease of use and performance,
//even though it's not easy on the eye :p
StringBuilder sb = new StringBuilder().Append(@"\b(");
//We're just going to make a unique regex with all the words
//and their plurals in a list, so we're looping here
foreach (var word in blacklist)
{
//Using a dot wasn't careful indeed... Feel free to replace
//"\W" with anything that does it for you. It will match
//any non-alphanumerical character
var regexPlural = ps.Pluralize(word).Replace(" ", @"\W");
var regexWord = word.Replace(" ", @"\W");
sb.Append(regexWord).Append('|').Append(regexPlural).Append('|');
}
sb.Remove(sb.Length - 1, 1); //removing the last '|'
sb.Append(@")\b");
return sb.ToString();
}
The usage is nothing surprising if you're already using regular expressions in .NET:
static void Main(string[] args)
{
string[] blacklist = {"Goodbye","Welcome","join us"};
string input = "Welcome, come join us at dummywebsite.com for fun and games, goodbye!";
//I assume that you want it case insensitive
Regex blacklistRegex = new Regex(GetBlacklistRegexString(blacklist), RegexOptions.IgnoreCase);
foreach (Match match in blacklistRegex.Matches(input))
{
Console.WriteLine(match);
}
Console.ReadLine();
}
We get written on the console the expected output:
Edit: still have a problem (working on it later), if "man" is in your keywords, it will match the "men" in "women"... Weirdly I don't get this behaviour on regexhero.
Edit 2: duh, of course if I don't group the words with parenthesis, the word boundaries are just applied to the first and last one... Corrected.