-1

If the input string is

Cat fish bannedword bread bánnedword mouse bãnnedword

It should output

Cat fish bread mouse

What would be the best way to do this without slowing down the performance?

user58322
  • 21
  • 3
  • 1
    http://blog.codinghorror.com/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea/ - Definitely comes to mind. This thread may help though http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net. – Kelly Robins Jun 14 '14 at 23:56
  • 1
    There is no solution that has no cost. Instead, **set a performance goal based on actual user requirements** and then find a solution within that goal. – Eric Lippert Jun 15 '14 at 00:39

1 Answers1

0

There are number of ways you can use but non of them (at least as far as I know) will work without certain performance cost.

The most obvious way is to remove the accented characters first and then use simple string.Replace(). As for removing accented characters this or this stackoverflow questions should help you.

Other approach could be splitting the string into an array of strings (each string being separate word) and then removing each word that equals the 'bannedword' using a parameter that makes Equals() method ignore accents.

Something like:

string[] splittedInput = input.Split(' ');
StringBuilder output = new StringBuilder();
foreach(string word in splittedInput) 
{
  if(string.Compare(word, bannedWord, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == false)
  {
    output.Append(word);
  }
} 

string s_output = output.ToString();

//I've not tested it in Visual Studio so there might be mistakes... (A LINQ could also simplify it (and potentially enable pluralization)).

And finally, it should be possible to come up with a clever regex solution (probably the fastest way) but not being an expert on regex I can't help you with that (this might point you in the right direction (if you know at least something about regexes)).

Community
  • 1
  • 1
Petrroll
  • 741
  • 7
  • 29
  • Thank you for your comment. I did realize that this would not work without a certain performance cost. I'm no good with Regex, so sadly I don't know how to do it. Thank you again. – user58322 Jun 15 '14 at 00:20
  • Then you can use the first or second solution :). I honestly think the first one is the best (there're code snippets if you follow the links, so it shouldn't be hard to understand). – Petrroll Jun 15 '14 at 00:23
  • Yes, but accented characters shouldn't be removed from non filtered words. – user58322 Jun 15 '14 at 01:37
  • Then the 2nd solution :). I'll update the answer to show how I meant it. – Petrroll Jun 15 '14 at 01:41