i think gawk and mawk 1/2 are also okay with the hideous but fool-proof octal method like
-v regex1="new[[:blank:]]+File[\\050]" # note the double quotes
once the engine takes out the first \\
layer, the regex being tested against is equivalent to
/new[[:blank:]]+File[\050]/
which is as safe as it gets. Reason why it matters is that something like
/new[[:blank:]]+File[\(]/
is something mawk/mawk2 are totally cool with but gawk will give an annoying warning message. octals (or [\x28]
) get rid of that cross-awk weirdness and allow the same custom string regex to be deployed across all 3
(haven't tested against less popular variants like BWK original or NAWK etc).
ps : since i'm on the subject of octal caveats, mawk/mawk2 and gawk in binary mode are cool with square bracket octals for all bytes, meaning
"[\\302-\\364][\\200-\\277]+" # this happens to be a *very* rough proxy for UTF-8
is valid for all 3. if you really want to be the hex guy, that same regex becomes
"[\\xC2-\\xF4][\\x80-\\xBF]+"
however, gawk in unicode mode will scream about locale whenever you attempt to put squares around any non-ASCII byte. To circumvent that, you'll have to just list them out with a bunch of or's like :
(\302|\303|\304.....|\364)(\200|\201......|\277)+
this way you can get gawk unicode mode to handle any arbitrary byte and also handle binary input data (whatever the circumstances caused that to happen), and perform full base64 or URI plus encoding/decoding from within (plus anything else you want, like SHA256 or LZMA etc).... So far I've even managed to get gawk in unicode mode to base64 encode an MP4 file input without gawk spitting out the "illegal multi byte" error message.
.....and also get gawk and mawk in binary modes to become mostly UTF-8 aware and safe.
The "mostly" caveat being I haven't implemented the minute details like directly doing normalization form conversions from within instead of dumping out to python3 and getting results back via getline
, or keeping modifier linguistics marks with its intended character if i do a UC-safe-substring string-reversal.