According to BRE/ERE Bracketed Expression section of POSIX regex specification:
- [...] The right-bracket (
']'
) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial circumflex ( '^'
), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as "[.].]"
) or is the ending right-bracket for a collating symbol, equivalence class, or character class. The special characters '.'
, '*'
, '['
, and '\'
(period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression.
and
- [...] If a bracket expression specifies both
'-'
and ']'
, the ']'
shall be placed first (after the '^'
, if any) and the '-'
last within the bracket expression.
Therefore, your regex should be:
echo "fdsl[]" | grep -Eo "[][ a-z]+"
Note the E
flag, which specifies to use ERE, which supports +
quantifier. +
quantifier is not supported in BRE (the default mode).
The solution in Mike Holt's answer "[][a-z ]\+"
with escaped +
works because it's run on GNU grep, which extends the grammar to support \+
to mean repeat once or more. It's actually undefined behavior according to POSIX standard (which means that the implementation can give meaningful behavior and document it, or throw a syntax error, or whatever).
If you are fine with the assumption that your code can only be run on GNU environment, then it's totally fine to use Mike Holt's answer. Using sed
as example, you are stuck with BRE when you use POSIX sed
(no flag to switch over to ERE), and it's cumbersome to write even simple regular expression with POSIX BRE, where the only defined quantifier is *
.
Original regex
Note that grep
consumes the input file line by line, then checks whether the line matches the regex. Therefore, even if you use P
flag with your original regex, \n
is always redundant, as the regex can't match across lines.
While it is possible to match horizontal tab without P
flag, I think it is more natural to use P
flag for this task.
Given this input:
$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89"
fds l[]kSAJD<>?,./:";'{}|[]\!@#$%^&*()_+-=~`89
The original regex in the question works with little modification (unescape +
at the end):
$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\!@#&?-]+"
fds l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89
Though we can remove \n
(since it is redundant, as explained above), and a few other unnecessary escapes:
$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\ta-zA-Z/:.0-9_~\"'+,;*=()$\!@#&?-]+"
fds l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89