136

How can I have a regular expression that tests for spaces or tabs, but not newlines?

I tried \s, but I found out that it tests for newlines too.

I use C# (.NET) and WPF, but it shouldn't matter.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jiew Meng
  • 84,767
  • 185
  • 495
  • 805
  • It may matter. .NET regular expression functions have the *multi-line option*. None of the answers addresses that (even if the default value of it may suffice). – Peter Mortensen Nov 13 '21 at 16:45

5 Answers5

255

Use character classes: [ \t]

Lekensteyn
  • 64,486
  • 22
  • 159
  • 192
53

Try this character set:

[ \t]

This does only match a space or a tabulator.

Gumbo
  • 643,351
  • 109
  • 780
  • 844
21

As Eiríkr Útlendi noted, the accepted solution only considers two white space characters: the horizontal tab (U+0009), and a breaking space (U+0020). It does not consider other white space characters such as non-breaking spaces (which happen to be in the text I am trying to deal with).

A more complete white space character listing is included on Wikipedia and also referenced in the linked Perl answer. A simple C# solution that accounts for these other characters can be built using character class subtraction:

[\s-[\r\n]]

Or, including Eiríkr Útlendi's solution, you get

[\s\u3000-[\r\n]]
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
erdomke
  • 4,980
  • 1
  • 24
  • 30
7

Note: For those dealing with CJK text (Chinese, Japanese, and Korean), the double-byte space (Unicode \u3000) is not included in \s for any implementation I've tried so far (Perl, .NET, PCRE, and Python). You'll need to either normalize your strings first (such as by replacing all \u3000 with \u0020), or you'll have to use a character set that includes this code point in addition to whatever other white space you're targeting, such as [ \t\u3000].

If you're using Perl or PCRE, you have the option of using the \h shorthand for horizontal whitespace, which appears to include the single-byte space, double-byte space, and tab, among others. See the Match whitespace but not newlines (Perl) question for more detail.

However, this \h shorthand has not been implemented for .NET and C#, as best I've been able to tell.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Eiríkr Útlendi
  • 1,160
  • 11
  • 23
  • 1
    Good point. Java's `\h` (introduced in Java 8) does include `\u3000`, but `\s` does not, unless you set UNICODE_CHARACTER_CLASS mode (introduced in Java 7). – Alan Moore Apr 19 '16 at 21:46
0

If you want to replace space, the below code worked for me in C#.

Regex.Replace(Line, "\\\s", "");

For Tab

Regex.Replace(Line, "\\\s\\\s", "");
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sameer Bahad
  • 555
  • 5
  • 4