how to grep specific element name

Question

I have tried to grep words that start with 'a' followed by at least 1 c, but no more than 2 c's.

So what I did was:

grep '^ac{1,2}' place/file/input.txt > place/file/output.txt

Doesn't that suppose to say True for words like accuse, acost, accurate, acacia? But when I run an assertion, it says False is not True.

Please let me know what I am neglecting here.

James Brown · Answer 1 · 2017-10-03T04:31:40.553

2

First some test material:

$ cat file
a       # miss
ac      # miss without this comment
acc     # miss without this comment
accc    # miss
accd    # hit

You need to escape the {}s (and ?+|()s):

$ grep 'ac\{1,2\}[^c]' file
accd

or

$ grep 'acc\?[^c]' file
accd

(... or use the extended patterns as explained in the other answer).

edited Oct 03 '17 at 04:31

answered Oct 03 '17 at 03:15

James Brown

36,089
7
43
59

score 0 · Accepted Answer · edited Oct 03 '17 at 11:41

0

By default grep patterns don't accept standard "extended" regular expression syntax (thanks to tripleee for pointing out how wrong my first writing of that was), so your syntax doesn't get translated as you expect. You can enable the extended patterns with egrep or -E:

grep -E '^ac{1,2}' place/file/input.txt > place/file/output.txt

-E
Match using extended regular expressions. Treat each pattern specified as an ERE, as described in the Base Definitions volume of IEEE Std 1003.1-2001, Section 9.4, Extended Regular Expressions. If any entire ERE pattern matches some part of an input line excluding the terminating , the line shall be matched. A null ERE shall match every line.

from the POSIX docs

though, 3 c's is also at 1 or 2 followed by anything else, so you'd want to make sure the next character is not a c:

grep -E '^ac{1,2}[^c]' place/file/input.txt > place/file/output.txt

Additionally, as James Brown points out, you can escape many of the characters to make grep process the regex as desired without the -E.

edited Oct 03 '17 at 11:41

tripleee

175,061
34
275
318

answered Oct 03 '17 at 01:22

Eric Renouf

13,950
3
45
67

This is incorrect. `grep` gets its very name from its ability to print regular expression matches. The difference is one between traditional regular expressions vs. extended regular expressions, which later have been formally defined and somewhat extended by POSIX into BRE and ERE. There are still more regex dialects, most notably PCRE which is popular in many modern programming languages (and supported by `grep -P` on some platforms). – tripleee Oct 03 '17 at 06:24
1

Maybe see also https://stackoverflow.com/questions/11856054/bash-easy-way-to-pass-a-raw-string-to-grep/11857890#11857890 for broader historical perspective. – tripleee Oct 03 '17 at 06:26
@tripleee obviously you're right, thanks for pointing out my sloppiness (which may only be marginally less sloppy now). – Eric Renouf Oct 03 '17 at 11:30
I tweaked the wording some more. *All* of these dialects are standardized by POSIX so it's not like ERE is somehow "more standard" than BRE, they are just different. – tripleee Oct 03 '17 at 11:42
@tripleee fair enough, my use of "standard" was that if you take a course that teaches regex you'll learn the ERE syntax, and that's likely a source of confusion for people first encountering the way the regex are implemented in tools like grep – Eric Renouf Oct 03 '17 at 11:44
Most beginners seem to expect PCRE these days, ERE was probably the "most standard" at one point but these days, the dialect in PHP, Python, Ruby, etc and of course Perl is what most people learn. – tripleee Oct 03 '17 at 11:47

how to grep specific element name

2 Answers2