You can search for words using regex without having to specify all possible separators.
string input = "This, is? a test-word!\r\nanother line.";
var matches = Regex.Matches(input, @"\w+");
foreach (Match m in matches) {
Console.WriteLine($"\"{m.Value}\" at {m.Index}, length {m.Length}");
}
prints:
"This" at 0, length 4
"is" at 6, length 2
"a" at 10, length 1
"test" at 12, length 4
"word" at 17, length 4
"another" at 24, length 7
"line" at 32, length 4
The expression \w+
specifies a sequence of one or more word characters. This includes letters, digits and the underscore. See Word Character: \w for a detailed description of \w
.
You can replace all (possibly multiple) separators by spaces like this:
char[] separators = new char[] { ' ', '.', ',', '!', '?', ':', '-', '\r', '\n' };
var words = input.Split(separators, StringSplitOptions.RemoveEmptyEntries);
string result = String.Join(" ", words);
Console.WriteLine(result);
prints
This is a test word another line
The StringSplitOptions.RemoveEmptyEntries
parameter ensures that sequences of multiple separators are treated like one separator.