POSIX defines two flavors of regular expressions:
BREs (Basic Regular Expressions) - the older flavor with fewer features and the need to \
-escape certain metacharacters, notably \(
, \)
and \{
, \}
, and no support for duplication symbols \+
(emulate with \{1,\}
) and \?
(emulate with \{0,1\}
), and no support for \|
(alternation; cannot be emulated).
EREs (Extended Regular Expressions) - the more modern flavor, which, however lacks regex-internal back-references (which is not the same as capture groups); also there is no support for word-boundary assertions (e.g, \<
) and no support for capture groups.
POSIX also mandates which utilities support which flavor: which support BREs, which support EREs, and which optionally support either, and which exclusively support only BREs, or only EREs; notably:
grep
uses BREs by default, but can enable EREs with -E
sed
, sadly, only supports BREs
- Both GNU and BSD
sed
, however, - as a nonstandard extension - do support EREs with the -E
switch (the better known alias with GNU sed
is -r
, but -E
is supported too).
awk
only supports EREs
Additionally, the regex libraries on both Linux and BSD/OSX implement extensions to the POSIX ERE syntax - sadly, these extensions are in part incompatible (such as the syntax for word-boundary assertions).
As for your specific regex:
It uses the syntax for non-capturing groups, (?:...)
; however, capture groups are pointless in the context of grep
, because grep
offers no replacement feature.
If we remove this aspect, we get:
[c,f]=("([a-z A-Z 0-9]|-|_|\/)+\.(js|html)")
This is now a valid POSIX ERE (which can be simplified - see Benjamin W's helpful answer).
However, since it is an Extended RE, using sed
is not an option, if you want to remain strictly POSIX-compliant.
Because both GNU and BSD/OSX sed
happen to implement -E
to support EREs, you can get away with sed
, if these platforms are the only ones you need to support - see anubhava's answer.
Similarly, both GNU and BSD/OSX grep
happen to implement the nonstandard -o
option (unlike what you state in your question), so, again, if these platforms are the only ones you need to support, you can use:
$ grep -Eo '[c,f]=("([a-z A-Z 0-9]|-|_|\/)+\.(js|html)")' file | cut -c 3-
c="foo.js"
f="bar.html"
(Note that only GNU grep
supports -P
to enable PCREs, which would simply the solution to (note the \K
, which drops everything matched so far):
$ grep -Po '[c,f]=\K("([a-z A-Z 0-9]|-|_|\/)+\.(js|html)")' file
)
If you really wanted a strictly POSIX-compliant solution, you could use awk
:
$ awk -F\" '/[c,f]=("([a-z A-Z 0-9]|-|_|\/)+\.(js|html)")/ { print "\"" $2 "\"" }' file