Scope of grep with regular expressions

Question

I am trying to use a regular expression with grep command of Linux

(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))

When I am trying it out at https://www.regextester.com with the contents of a file, I am getting the required result, i.e., the required fields are getting matched but when I am trying to use it as

grep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1

all it gives me is a null!

What's the problem here?

score 3 · Answer 1 · answered Jun 13 '12 at 11:41

3

I don't think grep understands character classes like \w and \s. Try using either grep -E or egrep. (grep -E is equivalent to egrep, egrep is just shorter to type.)

So your command would be:

egrep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1

answered Jun 13 '12 at 11:41

Tim Pote

27,191
6
63
65

thats cool but how do i do a multiline search? assuming grep works line by line.. i want a multi line search.. so is there any solution? – Kiran Vemuri Jun 13 '12 at 12:01
1

@KiranVemuri That's a different question that the one you posed here. That topic is covered by [this SO question](http://stackoverflow.com/questions/152708/how-can-i-search-for-a-multiline-pattern-in-a-file-use-pcregrep) – Tim Pote Jun 13 '12 at 12:59
By default, egrep doesn't understand \s or \w either. However, you can use the --perl-regexp flag if PCRE has been compiled in. – Todd A. Jacobs Jun 13 '12 at 14:51
@CodeGnome RTM: http://www.gnu.org/software/grep/manual/html_node/The-Backslash-Character-and-Special-Expressions.html#The-Backslash-Character-and-Special-Expressions – Tim Pote Jun 13 '12 at 15:36
1

Although, to be fair, it does say that it should work for `grep` as well. I'm pretty sure in older versions that was an `egrep` extension. – Tim Pote Jun 13 '12 at 15:40

score 2 · Accepted Answer · answered Jun 13 '12 at 18:37

2

pcregrep -M  '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))'

did the trick :)

answered Jun 13 '12 at 18:37

Kiran Vemuri

2,762
2
24
40

kdhp · Answer 3 · 2017-09-26T22:23:38.210

grep(1) uses POSIX Basic Regular Expressions by default, and POSIX Extended Regular Expressions when used with the -E option.

In POSIX Regular Expressions non-special characters have undefined behaviour when escaped, ex. \s, and there is no syntax for non-greedy matching, ex. +?. Furthermore, in BREs, the + and | operators are not available, and parenthesis must be escaped to perform grouping.

The POSIX character classes [[:space:]] and [[:alnum:]_] are a portable alternatives to \s and \w respectively.

Excluding the next matching character from a repetition can be used to emulate non-greedy matching, ex. [^*]+?\w*: is equivalent to [^*[:alnum:]_:]+[[:alnum:]_]*:.

The given regular expression can be represented as multiple BREs:

grep -e '^[[:space:]]*\*[[:space:]]\{1,\}\[ \][^*[:alnum:]_+]\{1,\}[[:alnum:]_]*:[^*]\{1,\}[[:digit:]]$' \
    -e '[^*]\{1,\}\.com\.au$' file1

or an ERE:

grep -E '^[[:space:]]*\*[[:space:]]*\[ \][^*[:alnum:]_:]+[[:alnum:]_]*:[^*]+[[:digit:]]$|[^*]+\.com\.au$' \
    file1

Note that the GNU implementation of grep(1) allows for both short character classes (\s and \w) and non-greedy repetition (+?), as non-portable extensions.

Scope of grep with regular expressions

3 Answers3

Linked