1

I want to use regular expressions in an awk script to match symbols like ^, and perform certain operations when they match within an if statement.

However, I have tried increasing the number of backslashes in MARK="\^" or using MARK="^", but I still can't get a match.

Are there any other possible causes to consider?

MARK="^"

awk ~~~
if (mark ~ "['$MARK}']+") {
            print $1;
}

I tried to escape it with a backslash and expected it to match.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
T.M
  • 11
  • 1
  • 1
    You say "symbols like ^" - different characters need different escaping if you want them to be treated literally (most should be put inside a bracket expression but, generally, `^` and ```\``` should be preceded by a ```\```), but in your code you're trying to escape a character to make it literal in a regexp comparison (`mark ~ ...`) which is usually a bad idea vs doing a string comparison (`mark == ...` or `index(mark,...)`) so I suspect that's not what your real code does. If you post a more realistic [mcve] of your real problem then we can help you. – Ed Morton Mar 28 '23 at 10:24

4 Answers4

1

You don't need an "if" here, a pattern is enough. Also, [^] might not work (it doesn't in gawk), you need to escape the caret:

mark=^
echo $'a\nb\nq^x\nd' | awk '(/\'"$mark"'/){print}'
choroba
  • 231,213
  • 25
  • 204
  • 289
  • 1
    Awk does have an `if` statement, though the OP's code does not look like a valid Awk script. – tripleee Mar 28 '23 at 08:09
  • Well, `if` can only be used inside a block, a pattern is enough here. – choroba Mar 28 '23 at 08:26
  • Why wouldn't `[^]` work? It should always match a literal caret anywhere in the line. BTW, a caret on the start of the line could be matched by `^^` . For instance, `echo ^x | awk '/^^/ { print }'` prints _^x_. – user1934428 Mar 28 '23 at 10:01
  • 4
    @user1934428 I might be wrong but AFAIK `[^]` is undefined behavior per POSIX since `^` as the first char in a bracket expression is defined to mean the negation of the subsequent list inside that bracket expression, but it's not defined as meaning a literal `^` if there is no subsequent list, even if logically that'd make sense. – Ed Morton Mar 28 '23 at 10:14
  • 2
    As a rule of thumb, don't let shell variables expand to become part of the body of the awk script (`/\'"$mark"'/`) or it can lead to complexity and/or cryptic errors. See [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script), e.g. try that with `mark='"'`. – Ed Morton Mar 28 '23 at 10:27
  • @EdMorton-SOstopbullying : You are right, that [the POSIX specs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_05) seem to have forgotten about this case. If the `^` would mean inversion here too, the meaning would be _every character except those in the following empty list_, i.e. _every character is permitted_, and hence `[^]` would be equivalend to `.`. Indeed, _gawk_ seems to take this interpretation, since `echo x | gawk '/[^^]/ { print }'` matches ...... – user1934428 Mar 28 '23 at 10:33
  • @user1934428 `[^^]` is defined, it means `every character except ^`. In gawk `[^]` would result in an "unterminated regexp" error (I just tried it with gawk 5.2.1) since it'd treat `]` as the first char in a negated char list inside an expected bracket expression, not the end of the bracket expression. – Ed Morton Mar 28 '23 at 10:43
  • Ah right, my typo. If I do a `gawk '/[^]/ { print }'`, I get an _unterminated regexp_, because the first caret is taken as a negation, and the `]`, being the first element in the list, is taken literally. Hence the bracketed expression is not closed. BTWI can read **this** behaviour from the POSIX specs, and therefore I would say that the behaviour is defined:. choroba is right in that `[^]` can't work. – user1934428 Mar 28 '23 at 10:54
  • We could also read other behavior from the POSIX specs though, as you previously described, which is why I'd say it's undefined. I wouldn't be surprised if different awks did different things with it. – Ed Morton Mar 28 '23 at 11:38
1

Have you tried matching it using its ASCII code?

\x5E

  • unescaped `\x` doesn't work :::: `echo 'abc123xyz' | gawk '/\x5E/'` ::::::> `abc123xyz` ::::::::::::::: `gawk '/\\x5E/'` fixes that, but that's so verbose might as well just `gawk '/\^/'` – RARE Kpop Manifesto Mar 30 '23 at 17:36
1

Given:

cat file
Line 1
Line^2
Line 3

You can use a pattern with ^ escaped as so:

awk '/\^/' file 
Line^2

To use a dynamic regex with the specific meta character of ^ you need to do some escaping gymnastics.

This works:

mark='\\^'
awk -v m="$mark" '$0~m' file
# Line^2

Or this:

mark='^'
awk -v m="$mark" 'BEGIN{pat="\\" m}
$0~pat' file
# same

This is covered HERE in the GNU awk manual.

Simpler still is just use index:

mark='^'
awk -v m="$mark" 'index($0,m)' file

Which will find the literal ^ without any escaping.

dawg
  • 98,345
  • 23
  • 131
  • 206
1

Without more info on what you're trying to do, you should not escape metachars to make a regexp metachar behave like a literal char in a regexp comparison, i.e. enhance this:

if (mark ~ "['$MARK}']+") {
            print $1;
}

you should just do a literal string comparison instead:

if ( index(mark,"'"$MARK"'") ) {
            print $1;
}

but you also should not let shell variables expand to become part of the body of an awk script, see How do I use shell variables in an awk script?, so you should be doing this or similar:

awk -v m="$MARK" '{
    .... whatever populates "mark" ....
    if ( index(mark,m) ) {
                print $1;
    }
}'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185