114

I have a textfile, containing something like:

12,34 EUR 
 5,67 EUR
 ...

There is one whitespace before 'EUR' and I ignore 0,XX EUR.

I tried:

grep '[1-9][0-9]*,[0-9]\{2\}\sEUR' => didn't match !

grep '[1-9][0-9]*,[0-9]\{2\} EUR' => worked !

grep '[1-9][0-9]*,[0-9]\{2\}\s*EUR' => worked !

grep '[1-9][0-9]*,[0-9]\{2\}\s[E]UR' => worked !

Can somebody explain me pls, why I can't use \s but \s* and \s[E] matched?

OS: Ubuntu 10.04, grep v2.5

Hash
  • 4,647
  • 5
  • 21
  • 39
Milde
  • 2,144
  • 3
  • 17
  • 15

1 Answers1

149

This looks like a behavior difference in the handling of \s between grep 2.5 and newer versions (a bug in old grep?). I confirm your result with grep 2.5.4, but all four of your greps do work when using grep 2.6.3 (Ubuntu 10.10).

Note:

GNU grep 2.5.4
echo "foo bar" | grep "\s"
   (doesn't match)

whereas

GNU grep 2.6.3
echo "foo bar" | grep "\s"
foo bar

Probably less trouble (as \s is not documented):

Both GNU greps
echo "foo bar" | grep "[[:space:]]"
foo bar

My advice is to avoid using \s ... use [ \t]* or [[:space:]] or something like it instead.

Chris Maes
  • 35,025
  • 12
  • 111
  • 136
Kamal
  • 7,160
  • 2
  • 21
  • 12
  • 28
    Or just `[:space:]`, for ex. like this: `cat file | grep "[[:space:]]"` – Kiril Kirov Nov 20 '10 at 16:36
  • it seems to be a bug in the newer version of grep (other point of view) according to this bug request http://www.mail-archive.com/bug-grep@gnu.org/msg02686.html but why does the last statement match? – Milde Nov 20 '10 at 22:20
  • 1
    @Milde, note the followup post http://www.mail-archive.com/bug-grep@gnu.org/msg02689.html where that bug report was marked invalid and closed (so this is not considered to be a bug in newer grep). – Kamal Nov 21 '10 at 16:56
  • 2
    @Milde, none of the grep documentation I've examined (old or new) actually refers to `\s` at all. I'd say its behavior is "undefined". Use [:space:] instead, which works as documented in old and new grep. – Kamal Nov 21 '10 at 16:59
  • thanks, I will use [:space:] in the future to avoid the problem – Milde Nov 22 '10 at 13:34
  • @BaiyanHuang \t doesn't seem to work on whatever version I'm using right now. I'm using it with color highlighting, and `echo 'a\sb ct' | grep '[ \t]'` highlights the \ and the t. It is whatever version comes bundled with the Windows git install that's on this computer. – Loduwijk Oct 10 '19 at 21:38
  • With grep 2.20 I have to use the extended flag `-E`: `grep -E "\s"` and `grep -E "[[:space:]]"` – Michele Piccolini Feb 05 '20 at 09:25
  • Note that on linux it is recommended to use single-quotes around your regex to avoid the shell handling backslash characters (taking them to escape special shell characters) instead of passing them through to grep/other command as part of the regex/parameter. e.g. `echo '\\'` will output both backslashes, but `echo "\\"` will output a single backslash (because the shell substitutes it). This does get a bit hairy if you need to include a single-quote character in your regex though. (You may need a combination of single-quoted bits and double-quoted bits etc.) – Mr Weasel Apr 23 '21 at 03:08