2

I have a file called test.txt with the following contents:

1 2 3

I have the following script that uses a regular expression to match at least one whitespace character between the numbers:

#!/bin/sh
if ! grep -q -e "1[ \t]+2[ \t]+3" test.txt; then
    echo "not found"
else
    echo "found"
fi

Executing the script prints out not found, but it should have print out found. Why is that?

pacoverflow
  • 3,726
  • 11
  • 41
  • 71

2 Answers2

1

Per the grep man:

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

Try:

#!/bin/sh
if ! grep -q -e "1[ \t]\+2[ \t]\+3" test.txt; then
    echo "not found"
else
    echo "found"
fi
Pezzer
  • 620
  • 5
  • 13
  • Thanks. An alternative would be to just change the `-e` to `-E`. – pacoverflow Aug 09 '17 at 02:35
  • Hmm, it's still not quite working - the script prints out `not found` if I change the spaces in `test.txt` to tabs. – pacoverflow Aug 09 '17 at 02:51
  • My man says `-P` is "highly experimental", so I'll change the ` \t` to `[:space:]` instead. Thank you! – pacoverflow Aug 09 '17 at 03:05
  • @pacoverflow Sorry, I should have caught that! `\t` won't work with `grep` because grep uses a POSIX regex definition (which doesn't define `\t` as a tab character). You easily work around this by just pasting a literal tab character into your pattern. Alternatively, depending on your environment, you can try using the `-P` flag to tell `grep` to use the PERL regex definition. I think there are other solutions as well. – Pezzer Aug 09 '17 at 03:07
0

Well, I tried to edit the other answer, which is incorrect as it currently stands. But the edit was rejected, so I'll have to post my own answer, given that comments are "second class citizens on the Stack Exchange network, not designed to hold information for all eternity [and] may get cleaned up at any time."

As mentioned in the other answer, the -e option only supports basic regular expressions (meaning that + does not have special meaning). Therefore the -E option should be used for extended regular expressions, which support the + metacharacter.

In addition, grep only supports POSIX regular expressions, which do not recognize \t as a tab character. The easiest way to fix this, while still maintaining readability and without using any experimental grep options (such as -P) is to replace [ \t] with [[:space:]].

Therefore the fixed script looks like:

#!/bin/sh
if ! grep -q -E "1[[:space:]]+2[[:space:]]+3" test.txt; then
    echo "not found"
else
    echo "found"
fi
pacoverflow
  • 3,726
  • 11
  • 41
  • 71
  • @CharlesDuffy It has been unaccepted. BTW, I looked at the 2 questions you cited while closing the question as a duplicate, and I do not think they apply. The [first question](https://stackoverflow.com/questions/7805676/difference-between-grep-and-perl-regex) is about perl regex, which I am not using. The [second question](https://stackoverflow.com/questions/4233159/grep-regex-whitespace-behavior) is about `\s` which I am not using and the answer even mentions using `\t` which is incorrect for my situation. – pacoverflow Aug 09 '17 at 21:32
  • Gotcha. I read `\t` as something taken from PCRE engines, but you make a strong enough case -- reopened. – Charles Duffy Aug 10 '17 at 00:38