grep regexp to match space and/or TAB and '[:space:]' class

Question

On CentOS 8, this grep expresssion does not return matched strings:

% dmidecode -t memory | grep -E '^[ \t]+Size: [0-9]+'

However this one does return matched lines correctly (on the same distro):

% dmidecode -t memory | grep -E '^[[:space:]]+Size: [0-9]+'

What is the reason of such behaviour? As you can see both times grep is invoked in extended regexp mode.

In `grep '^[ \t]+Size: [0-9]+'`, the pattern is parsed as POSIX BRE. Pass the `-E` flag to make it a POSIX ERE if you want `+` to be parsed as a quantifier. — Wiktor Stribiżew, Dec 17 '20 at 19:26
When posting the question, I missed out '-E' in the first command, it should be there, still grep it does not return matched lines. — Mark, Dec 17 '20 at 19:41
Then you need `grep -E '^[[:blank:]]+Size: [0-9]+'` or ``grep -E '^[[:blank:]]+Size:[[:blank:]]+[0-9]+'``. Note that `[ \t]` as a regex pattern is actually equivalent to the `[:blank:]` POSIX character class, not `[:space:]` (it includes verical whitespace, too). — Wiktor Stribiżew, Dec 17 '20 at 20:10
In general, your problem is most likely related to how the Cent OS treats single quoted strings when passing them to `grep` rather than the regex pattern, which works otherwise. — Wiktor Stribiżew, Dec 17 '20 at 20:16
@WiktorStribiżew No, it's the regular expression. The first one just doesn't work the way OP thinks it does. — Shawn, Dec 17 '20 at 23:33
Does this answer your question? [Which white space in grep is the best standard?](https://stackoverflow.com/questions/39633176/which-white-space-in-grep-is-the-best-standard) — Ryszard Czech, Dec 18 '20 at 00:25
Does this answer your question? [grep a tab in UNIX](https://stackoverflow.com/questions/1825552/grep-a-tab-in-unix) — Tsyvarev, Dec 20 '20 at 10:30

Shawn · Accepted Answer · 2020-12-17T23:42:01.510

The issue here is the \t character sequence. This does not match a tab character in a grep regular expression, it matches the character t (Doesn't matter if it's basic or extended dialect RE). It's not treated as a special escape sequence the way it is by some other tools (Including GNU grep using the PCRE dialect).

Witness:

# printf /does/ treat \t and \n special in a format
$ printf "a\tb\n" | grep "a[ \t]b" # No match
$ printf  "atb\n" | grep "a[ \t]b" # Match
atb
$ printf "a\tb\n" | grep "a[[:space:]]b" # Match
a     b
$ printf "a\tb\n" | grep "a[[:blank:]]b" # Match
a     b
$ printf "a\tb\n" | grep "a\sb" # Match, \s is a GNU grep extension
a     b
$ printf "a\tb\n" | grep -P "a\sb" # Match, GNU grep using PCRE
a     b
$ printf "a\tb\n" | grep -P "a[ \t]b" # Match, GNU grep using PCRE.
a     b

score -1 · Answer 2 · answered Dec 17 '20 at 19:58

-1

Use [[:blank:]] which matches space char and tab char. You can omit -E too:

grep '^[[:blank:]]+ Size: [0-9]+'

answered Dec 17 '20 at 19:58

Bohemian

412,405
93
575
722

2

Your solution [does not work](https://ideone.com/lCiWEy). OP's solution works with `-E` option. In POSIX BRE, `+` matches a plus symbol. – Wiktor Stribiżew Dec 17 '20 at 20:08
1

Needs to be `^[[:blank:]]\{1,\} Size: [0-9]\{1,\}` for the same effect in a BRE. – Shawn Dec 17 '20 at 23:07

grep regexp to match space and/or TAB and '[:space:]' class

2 Answers2