Input:
dsfdsf www. cnn .com dksfj kdsfjkdjfdf www.google.com dkfjkdjfk w w w . ya hoo .co mdfdd
Output:
dsfdsf dksfj kdsfjkdjfdf dkfjkdjfk mdfdd
How do I write a function that does this in C#?
Basically you would have to implement two steps:
Normalization means that you would remove all whitespace and other noise characters from your input, then you do a transcoding of all diacritics, special characters etc into the basic latin alphabet (this is to map identical- or similar-looking glyphs to one single char, e.g. omicron and o look identical). You would need to retain a one-to-one mapping from the normalized version of the input to the original input.
Then you would search the normalized input for blocked patterns, retrieve the same pattern in the original input and remove it.
Of course, this approach is not fail-safe, you might get false positives actually.
A good answer describing how the simple filtering is doomed can be found here:
Start with learning about the RegEx (Regular Expression) facilities in C#, then you'll need a good RegEx that matches a URL. You'd need to change this to manage URLs with spaces though.