7

I am trying to use a regular expression with grep command of Linux

(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))

When I am trying it out at https://www.regextester.com with the contents of a file, I am getting the required result, i.e., the required fields are getting matched but when I am trying to use it as

grep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1

all it gives me is a null!

What's the problem here?

Mateusz Piotrowski
  • 8,029
  • 10
  • 53
  • 79
Kiran Vemuri
  • 2,762
  • 2
  • 24
  • 40

3 Answers3

3

I don't think grep understands character classes like \w and \s. Try using either grep -E or egrep. (grep -E is equivalent to egrep, egrep is just shorter to type.)

So your command would be:

egrep '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))' file1
Tim Pote
  • 27,191
  • 6
  • 63
  • 65
  • thats cool but how do i do a multiline search? assuming grep works line by line.. i want a multi line search.. so is there any solution? – Kiran Vemuri Jun 13 '12 at 12:01
  • 1
    @KiranVemuri That's a different question that the one you posed here. That topic is covered by [this SO question](http://stackoverflow.com/questions/152708/how-can-i-search-for-a-multiline-pattern-in-a-file-use-pcregrep) – Tim Pote Jun 13 '12 at 12:59
  • By default, egrep doesn't understand \s or \w either. However, you can use the --perl-regexp flag if PCRE has been compiled in. – Todd A. Jacobs Jun 13 '12 at 14:51
  • @CodeGnome RTM: http://www.gnu.org/software/grep/manual/html_node/The-Backslash-Character-and-Special-Expressions.html#The-Backslash-Character-and-Special-Expressions – Tim Pote Jun 13 '12 at 15:36
  • 1
    Although, to be fair, it does say that it should work for `grep` as well. I'm pretty sure in older versions that was an `egrep` extension. – Tim Pote Jun 13 '12 at 15:40
2
pcregrep -M  '(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|([^\*]+[.]com[.]au$))'

did the trick :)

Kiran Vemuri
  • 2,762
  • 2
  • 24
  • 40
0

grep(1) uses POSIX Basic Regular Expressions by default, and POSIX Extended Regular Expressions when used with the -E option.

In POSIX Regular Expressions non-special characters have undefined behaviour when escaped, ex. \s, and there is no syntax for non-greedy matching, ex. +?. Furthermore, in BREs, the + and | operators are not available, and parenthesis must be escaped to perform grouping.

The POSIX character classes [[:space:]] and [[:alnum:]_] are a portable alternatives to \s and \w respectively.

Excluding the next matching character from a repetition can be used to emulate non-greedy matching, ex. [^*]+?\w*: is equivalent to [^*[:alnum:]_:]+[[:alnum:]_]*:.

The given regular expression can be represented as multiple BREs:

grep -e '^[[:space:]]*\*[[:space:]]\{1,\}\[ \][^*[:alnum:]_+]\{1,\}[[:alnum:]_]*:[^*]\{1,\}[[:digit:]]$' \
    -e '[^*]\{1,\}\.com\.au$' file1

or an ERE:

grep -E '^[[:space:]]*\*[[:space:]]*\[ \][^*[:alnum:]_:]+[[:alnum:]_]*:[^*]+[[:digit:]]$|[^*]+\.com\.au$' \
    file1

Note that the GNU implementation of grep(1) allows for both short character classes (\s and \w) and non-greedy repetition (+?), as non-portable extensions.

kdhp
  • 2,096
  • 14
  • 15