0

bash --version

GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin19)

Description:

I want to extract the WorkingDir key value from this string dictionary:

config="""
"TerraformCommand": "terragrunt-info",
  "WorkingDir": "usr/terraform-modules/terraform-aws-codebuild/examples/.terragrunt-cache/xh9X2WgTwVjjRHiPBjKUl0Lr86w/SGN8gG45haGoXT7IhOh9_iuKkbc"
}
"""

In this case, the expected output would be:

"usr/terraform-modules/terraform-aws-codebuild/examples/.terragrunt-cache/xh9X2WgTwVjjRHiPBjKUl0Lr86w/SGN8gG45haGoXT7IhOh9_iuKkbc"

Attempts:

So far I've tried using different methods using this positive lookbehind/lookahead pattern: '(?<=("WorkingDir":\s")).+(?=")' 1.

echo `expr "$config" : '(?<=("WorkingDir":\s")).+(?=")'`

Output: 0

2.

pat='(?<=("WorkingDir":\s")).+(?=")'
[[ $config =~ $pat ]]
echo "${BASH_REMATCH[0]}"
echo "${BASH_REMATCH[1]}"

output:


echo $config | grep -o '(?<=("WorkingDir":\s")).+(?=")'

output:

Marshallm
  • 965
  • 3
  • 20
  • 38
  • `expr` is not part of bash. Bash has no control of what features it offers. – Charles Duffy Jan 23 '21 at 04:56
  • Same for `grep`, for that matter. At least with grep, though, _some_ operating systems ship a version with a `-P` option to enable PCRE extensions. – Charles Duffy Jan 23 '21 at 04:56
  • (and even though `=~` is implemented as part of bash itself, it just calls the local operating system's libc regex calls, so which features it has beyond those mandated by the POSIX ERE standard is, once again, completely at the mercy of your local OS vendor: Apple if you're on MacOS, GNU if you're on a conventional Linux distro, etc). – Charles Duffy Jan 23 '21 at 04:57
  • `expr` doesn't understand that regex. GNU `grep` might but you need to invoke it with `-P` flag. – oguz ismail Jan 23 '21 at 04:57
  • 2
    @oguzismail, ...assuming the OP is on a GNU platform with optional libpcre support compiled in. That it wouldn't be is not just a theoretical possibility -- GNU grep without libpcre is something I've seen in the wild fairly often. – Charles Duffy Jan 23 '21 at 04:58
  • @Marshallm, ...honestly, my usual approach is to restructure your code to not _need_ the lookbehind. That's rarely very hard. – Charles Duffy Jan 23 '21 at 05:01
  • 1
    @Marshallm, ...also, it looks like your code is YAML or JSON; parsing them with a real syntax-aware parser is going to be a lot more reliable of any PCRE-based hackery, even if you _did_ have full lookahead/lookbehind/etc. – Charles Duffy Jan 23 '21 at 05:02
  • tried `echo $config | grep -o -P '(?<=("WorkingDir":\s")).+(?=")'` but just outputted grep usage – Marshallm Jan 23 '21 at 05:02
  • @Marshallm, right, you're on MacOS. Macs don't have GNU grep. – Charles Duffy Jan 23 '21 at 05:03
  • 1
    ...for a more academic discussion on why the historically defined regex approach (that could be compiled to a finite state machine) is better than the new "modern" ad-hoc regex syntaxes anyhow, see https://swtch.com/~rsc/regexp/regexp1.html. Short form: The 1970s way guaranteed _very_ fast execution; the "modern" approach has nasty performance corner cases and can be excruciatingly slow. So again: Better to not rely on those fancy new "modern" features. – Charles Duffy Jan 23 '21 at 05:04
  • 1
    BTW, `echo $config` is buggy; always `echo "$config"`, with the quotes. See [I just assigned a variable, but `echo $variable` shows something else!](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else) -- if your variable contains a `*` surrounded by whitespace, you don't want it to be replaced with a list of files in your current directory. – Charles Duffy Jan 23 '21 at 05:05
  • @Marshallm, ...anyhow, if you want a guaranteed-to-be-present PCRE implementation on MacOS, I'd be using Python for your regex matching. – Charles Duffy Jan 23 '21 at 05:07
  • @Charles Duffy man you're quick, thanks for the advice. Just thinking the same in regards to using python with the re package – Marshallm Jan 23 '21 at 05:08
  • `zsh` optionally supports pcre matching, though I don't know if the version that comes with OS X is configured to have it. – Shawn Jan 23 '21 at 06:43
  • 1
    @Marshallm It is really opninion-based whether to use POSIX DFA or non-POSIX NFA. Please ignore these discussions. Always use what does the job in the best way, best way for you. `echo $config | grep -oP '"WorkingDir":\s*"\K[^"]+'` will work for you if you have a GNU grep only, but you are on Mac, so you'd need to install GNU grep first (or pcregrep). `sed` can be used here, too. But Bash matching with a capturing group is nice, too. However, JSON should be parsed with dedicated tools, like `jq`. Regex is the means of last resort here. – Wiktor Stribiżew Jan 23 '21 at 11:24

1 Answers1

3

As was pointed out in the comments, expr only uses "basic" (aka "obsolete") regular expressions, and the regex engine that bash's =~ operator uses doesn't support lookahead or lookbehind (or the \s shorthand for space either). But you don't need lookaround, just match everything and use a capture group to pick out the part you want (and store the pattern in a variable to avoid possible parsing inconsistencies):

WorkingDirPattern='\"WorkingDir":[[:space:]]\"([^"]+)\"'
if [[ "$config" =~ $WorkingDirPattern ]]; then
    WorkingDir="${BASH_REMATCH[1]}"    # get the contents of the first capture group
    echo "WorkingDir is $WorkingDir"
else
    echo "No WorkingDir found" >&2
fi
Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151