9

GNU grep 2.5.4 on bash 4.1.5(1) on Ubuntu 10.04

This matches

$ echo "this is a     line" | grep 'a[[:space:]]\+line'
this is a     line

But this doesn't

$ echo "this is a     line" | grep 'a\s\+line'

But this matches too

$ echo "this is a     line" | grep 'a\s\+\bline'
this is a     line

I don't understand why #2 does not match (whereas # 1 does) and #3 also shows a match. Whats the difference here?

Owen Blacker
  • 4,117
  • 2
  • 33
  • 70
Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208

2 Answers2

4

grep doesn't support the complete set of regular expressions, so try using -P to enable perl regular expressions. You don't need to escape the + i.e.

echo "this is a     line" | grep -P 'a\s+line' 
dogbane
  • 266,786
  • 75
  • 396
  • 414
  • I am more interested in knowing why #2 does not match but #3 does. An extra zero width assertion (word boundary) makes such a difference ? – Ankur Agarwal Aug 11 '11 at 04:42
4

Take a look at your grep manpage. Perl added a lot of regular expression extensions that weren't in the original specification. However, because they proved so useful, many programs adopted them.

Unfortunately, grep is sometimes stuck in the past because you want to make sure your grep command remains compatible with older versions of grep.

Some systems have egrep with some extensions. Others allow you to use grep -E to get them. Still others have a grep -P that allows you to use Perl extensions. I believe Linux systems' grep command can use the -P extension which is not available in most Unix systems unless someone has replaced the grep with the GNU version. Newer versions of Mac OS X also support the -P switch, but not older versions.

David W.
  • 105,218
  • 39
  • 216
  • 337
  • I am more interested in knowing why #2 does not match but #3 does. An extra zero width assertion (word boundary) makes such a difference ? – Ankur Agarwal Aug 11 '11 at 04:42
  • Actually, both work for #2 and #3 me. I did this on Cygwin and on the most recent release of Ubuntu Linux running on Virtual Box on my PC. I haven't tried it on my Mac at home. Neither work on our Solaris system since it doesn't use the GNU version of `grep` and doesn't recognize anything besides the very basic regex. Is there anything in that wide blank space besides spaces? A tab character perhaps? – David W. Aug 11 '11 at 13:38
  • I tried again. But there is no tab character there. This is kind of puzzling and surprising. – Ankur Agarwal Aug 11 '11 at 17:22