Correct Syntax For Escaping Double Quotes in Regex Pattern Match?

Question

I'm trying to get the 2nd substring between the double quotes chars in vars string & string2.

I think the problem is the way I'm trying to escape the double quotes.

What is the correct syntax for this:

#!/bin/bash

# Example strings.

string='"name": "Bash scripting cheatsheet",'
string2='"url": "https://devhints.io/bash"'

# I'm trying to get the 2nd substring between " "

# desired matches:
# string_name_match='Bash scripting cheatsheet'
# string2_url_match='https://devhints.io/bash'

# Attempts: using a pattern var with double quotes escaped.

pattern='\".*\"'  # Is the " char escaped correctly?
echo "$string" | awk "/$pattern/{print $2}" # Is the $pattern var used correctly?
echo "$string2" | awk "/$pattern/{print $2}" 

# 2nd pattern match using the name/url to parse:

name_pattern='^\"name:\"[:space:].*[^\",]'
url_pattern='^\"url\"[:space:]\"^url:.*[^"]'
echo "$string" | awk "/$name_pattern/{print $0}"
echo "$string2" | awk "/$url_pattern/{print $0}"

Personally, I use and recommend `["]` in favor of `\"` -- though that is indeed just a matter of preference. Bigger issue here, though, is using string substitution to generate code (instead of passing data out-of-band from code, as the answers using `awk -v` advise) — Charles Duffy, Jan 28 '22 at 19:57
The answer you accepted has nothing to do with matching a regexp so it'll produce the output you posted from the input you posted but it won't check that the input line matches your regexp(s). You can get the same output from `cut -d'"' -f4` - is that what you were really looking for? To use the value of a shell variable in awk, btw, see [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script). — Ed Morton, Jan 28 '22 at 23:54

score 4 · Accepted Answer · answered Jan 28 '22 at 19:28

4

Here is how you do it in awk:

awk -F '"' -v n=2 '{print $(n*2)}' <<< "$string"
Bash scripting cheatsheet

awk -F '"' -v n=2 '{print $(n*2)}' <<< "$string2"
https://devhints.io/bash

answered Jan 28 '22 at 19:28

anubhava

761,203
64
569
643

1

Thanks. Is there a difference between <<< and echo | to pass the string to the awk statement? Is one preferred over the other? – Emily Jan 28 '22 at 19:33
`<<<` is bash's here-string and it is preferred over pipe as it doesn't create a sub shell. – anubhava Jan 28 '22 at 19:35
1

Thanks. Looks like a have some refactoring to do, as I've used the pipe command a bunch unkowingly. – Emily Jan 28 '22 at 19:39
2

Piping from `echo` is more portable to shells other than bash, but it can also cause trouble because some versions of `echo` will try to interpret any escape (backslash) sequences in the string. As long as your script as running under bash, `<<<` is more efficient and consistent. – Gordon Davisson Jan 28 '22 at 19:53
2

@Emily You can avoid the portability and reliability issues with echo by switching from `echo "$string"` to `printf '%s\n' "$string"`, which is guaranteed to work the same way on all POSIX-compliant shells -- though using a herestring is still more efficient when you know the shell is bash. – Charles Duffy Jan 28 '22 at 20:00
Newbie q: What is the "%s\n" part of the printf command doing? – Emily Jan 29 '22 at 17:14
`"%s\n"` is format string in `printf` built-in – anubhava Jan 29 '22 at 17:32

markp-fuso · Answer 2 · 2022-01-28T20:08:22.850

Addressing the current issue of passing a regex to awk, due to various issues with escape sequences it's usually easier to deal with variables instead of hard-coded regex patterns, combined with testing the entire line ($0) against the pattern (~ pattern_variable), eg:

string='"name": "Bash scripting cheatsheet",'
string2='"url": "https://devhints.io/bash"'
pattern='"([^"]*)".*"([^"]*)"'

$ awk -v ptn="${pattern}" -F'"' '$0 ~ ptn {print $2}' <<< "${string}"
"Bash

$ awk -v ptn="${pattern}" '$0 ~ ptn {print $2}' <<< "${string2}"
"https://devhints.io/bash"

OK, so we got awk working with the regex but we're not getting quite what we wanted because by default awk uses white space as the default field delimiter. We can tell awk to use the double quote as a delimiter, and knowing that the value we want is between the 2nd set of double quotes:

$ awk -v ptn="${pattern}" -F'"' '$0 ~ ptn {print $4}' <<< "${string}"
Bash scripting cheatsheet

$ awk -v ptn="${pattern}" -F'"' '$0 ~ ptn {print $4}' <<< "${string2}"
https://devhints.io/bash

'course, this requires spawning a subprocess each time we want to parse a string.

There are a few (better) ways to parse a string in bash without the overhead of spawning subprocess calls ...

One idea using some basic bash regex matching:

string='"name": "Bash scripting cheatsheet",'
string2='"url": "https://devhints.io/bash"'
pattern='"([^"]*)".*"([^"]*)"'

If bash finds a match it will populate the BASH_REMATCH[] array with info about the match(es), with each capture group (the part of the pattern inside a set of parens) making up a separate entry in the array.

Consider:

$ [[ "${string}" =~ ${pattern} ]] && string_name_match="${BASH_REMATCH[2]}"
$ typeset -p BASH_REMATCH string_name_match
declare -ar BASH_REMATCH=([0]="\"name\": \"Bash scripting cheatsheet\"" [1]="name" [2]="Bash scripting cheatsheet")
declare -- string_name_match="Bash scripting cheatsheet"

$ echo "${string_name_match}"
Bash scripting cheatsheet



$ [[ "${string2}" =~ ${pattern} ]] && string2_url_match="${BASH_REMATCH[2]}"
$ typeset -p BASH_REMATCH string2_url_match
declare -ar BASH_REMATCH=([0]="\"url\": \"https://devhints.io/bash\"" [1]="url" [2]="https://devhints.io/bash")
declare -- string2_url_match="https://devhints.io/bash"

$ echo "${string2_url_match}"
https://devhints.io/bash

score 1 · Answer 3 · answered Jan 28 '22 at 21:37

With your shown samples, please try following grep code. Written and tested in GNU grep.

echo "$string" | grep -oP '.*?"[^"]*".*?"\K[^"]*'
Bash scripting cheatsheet

echo "$string2" | grep -oP '.*?"[^"]*".*?"\K[^"]*'
https://devhints.io/bash

Explanation: Using GNU grep here. Printing value of string(s) by echo command and sending it as a standard input to grep command. In grep command using regex .*?"[^"]*".*?"\K[^"]*(which is explained below) to achieve required output.

Explanation of regex(.*?"[^"]*".*?"\K[^"]*):

.*?"    ##using lazy match capability of GNU grep and matching till very first occurrence of " here.
[^"]*"  ##Then matching everything just before next occurrence of " including " here.
.*?"    ##Using lazy match to match till very next occurrence of " here, which will be 3rd occurrence of ".
\K      ##Now using magical \K option of GNU grep to forget(basically not to print) whatever was matched before.
[^"]*   ##Matching everything just before 4th occurrence of " which is required output.

dawg · Answer 4 · 2022-01-29T17:22:42.750

You can use a Bash regex:

$ [[ $string =~ ^([^\"]*\"){4} ]] && echo "${BASH_REMATCH[1]%\"}"
Bash scripting cheatsheet

$ [[ $string2 =~ ^([^\"]*\"){4} ]] && echo "${BASH_REMATCH[1]%\"}"
https://devhints.io/bash

Or same method with sed:

sed -E 's/^([^"]*\"){4}/\1/; s/".*//' <<<"$string"
Bash scripting cheatsheet

sed -E 's/^([^"]*\"){4}/\1/; s/".*//' <<<"$string2"
https://devhints.io/bash

(But escaping the " is not required with the sed...)

score 0 · Answer 5 · answered Jan 29 '22 at 11:57

0

Here is another simple solution:

Using gawk standard Linux awk. FPAT variable is a regexp that match the data fields.

echo '"url": "https://devhints.io/bash"' |awk -vFPAT='[^\"]*' '{print $4}'
https://devhints.io/bash

answered Jan 29 '22 at 11:57

Dudi Boy

4,551
1
15
30

Correct Syntax For Escaping Double Quotes in Regex Pattern Match?

5 Answers5