Disclaimer: I haven't looked at the word list-it didn't load on my cellphone. I imagine, from the comments, it's like a csv list of 4 million "words"; the sort of "words" you might see if you had a million monkeys bashing on a million typewriters, probably because that's some corollary to how they were actually generated
In the comments you seemed to indictate that the words in the input string are separated by spaces otherwise they don't show up as emojis. As such I'd load the 4 million exclusions into a hashset, split the string into words, remove words in the hashset then recombine the result:
private static HashSet<string> excludes = new HashSet<string>(); // --> 4 million entries, load it once
string message = "Hello, how are you this fine day?\r\nMy name is SO."; // User input
var bits = input.Split(' ');
for (int x = 0; x < bits.Length; x++)
{
if (exclude.Contains(bits[i]))
{
bits[i] = null;
}
}
var result = string.Join(" ", bits);
This just splits on space, then it knows it can recompose it using space. If your input will have other characters (I can see you have an \r\n there which would defeat this) then you probably want to look at splitting but keeping the delimiters so that you can get your split tokens, replace some, then do a string.Concat
instead of join. If you want to wring every last millisecond out of it, then you probably need to look at shifting the solution to Spans but this simple start might provide you with something to investigate whether it's an avenue worth pursuing.
All in I think it's important to tokenize your input and then check entries in their entirety rather than just perform naive repeated replacements, because if your word list contains "hel", "lo", "orl" and "wd" then "hello world" will be reduced to just the separating space even though it contains none of those words. This is because eg replacing "orl" in "world" with nothing creates "wd" which never existed. Also important to note that if the replacements were performed in a different order you'd get a different result ("hello" would disappear but "world" would become "wd" if the "orl" replacement was done last)
Worth noting that hashset is by default case sensitive. do t think you said whether your LULwuts are case sens or not. If they're not, make the hashset case insens