64

Given the following shell script, would someone be so kind as to explain the grep -Po regex please?

#!/bin/bash
# Issue the request for a bearer token, json is returned
raw_json=`curl -s -X POST -d "username=name&password=secret&client_id=security-admin-console" http://localhost:8081/auth/realms/master/tokens/grants/access`
# Strip away all but the "access_token" field's value using a Python regular expression
bearerToken=`echo $raw_json | grep -Po '"'"access_token"'"\s*:\s*"\K([^"]*)'`
echo "The bearer token is:"
echo $bearerToken

So specifically, I'm interested in understanding the parts of the regex

grep -Po '"'"access_token"'"\s*:\s*"\K([^"]*)'`

and how it works. Why so many quotes? What is the "K" for? I've some experience with grep regex but this confuses me.

This is the actual output of the curl command and the shell script (grep) works as desired returning just the contents of the "access_token" value.

{"access_token":"eyJhbGciOiJSandNoThisIsntRealndmbS1yZWFsbSI6eyJyb2xlcyI6WyJtYW5hZ2UtY2xpZW50cyIsInZpZXctcmVhbG0iLCJtYW5hZ2UtZXZlbnRzIiwidmlldy1ldmVudHMiLCJ2aWV3LWFwcGxpY2F0aW9ucyIsInZpZXctdXNlcnMiLCJ2aWV3LWNsaWVudHMiLCJtYW5hZ2UtdXNlcnMiLCJtYW5hZ2UtYXBwbGljYXRpb25zIiwibWFuYWdlLXJlYWxtIl19LCJtYXN0ZXItcmVhbG0iOnsicm9sZXMiOlsibWFuYWdlLWV2ZW50cyIsIm1hbmFnZS1jbGllbnRzIiwidmlldy1yZWFsbSIsInZpZXctZXZlbnRzIiwidmlldy1hcHBsaWNhdGlvbnMiLCJ2aWV3LXVzZXJzIiwidmlldy1jbGllbnRzIiwibWFuYWdlLXJlYWxtIiwibWFuYWdlLXVzZXJzIiwibWFuYWdlLWFwcGxpY2F0aW9ucyJdfX19.fQmQKn-xatvflHPAaxCfrrVow3ynpw0sREho7__jZo2d0g1SwZV7Lf4C26CcweNLlb3wmKHHo63HRz35qRxJ7BXyiZwHgXokvDJj13yuOb6Sirg9z02n6fwGy8Iog30pUvffnDaVnUWHfVL-h_R4-OZNf-_YUK5RcL2DHt0zUXI","expires_in":60,"refresh_expires_in":1800,"refresh_token":"eyJhbGciOiJSUzI1NiJ9.eyJqdGkiOiJlNWFmYTZiOC04ZjM5LTQ5MjUtOWZiMC00MmY3MTM4YzUzMGIiLCJleHAiOjE0NDY4Mjk3OTksIm5iZiI6MCwAreYouKiddingIwouldnotputSOmethigRealHereNpb25fc3RhdGUiOiI2MmVmYzA1Yy0xYmY1LTRmNTUtYjc0OS01ZTBlZmY5NDE1NWIiLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsiYWRtaW4iLCJjcmVhdGUtcmVhbG0iXX0sInJlc291cmNlX2FjY2VzcyI6eyJ3Zm0tcmVhbG0iOnsicm9sZXMiOlsibWFuYWdlLWV2ZW50cyIsInZpZXctcmVhbG0iLCJtYW5hZ2UtY2xpZW50cyIsInZpZXctYXBwbGljYXRpb25zIiwidmlldy1ldmVudHMiLCJ2aWV3LXVzZXJzIiwidmlldy1jbGllbnRzIiwibWFuYWdlLXJlYWxtIiwibWFuYWdlLWFwcGxpY2F0aW9ucyIsIm1hbmFnZS11c2VycyJdfSwibWFzdGVyLXJlYWxtIjp7InJvbGVzIjpbInZpZXctcmVhbG0iLCJtYW5hZ2UtY2xpZW50cyIsIm1hbmFnZS1ldmVudHMiLCJ2aWV3LWFwcGxpY2F0aW9ucyIsInZpZXctZXZlbnRzIiwidmlldy11c2VycyIsInZpZXctY2xpZW50cyIsIm1hbmFnZS1hcHBsaWNhdGlvbnMiLCJtYW5hZ2UtdXNlcnMiLCJtYW5hZ2UtcmVhbG0iXX19fQ.WeiJOC1jQ52aKgnW8UN2Lv9rJ_yKZiOhijOYKLN2EEOkYF8rvRZsSKbTPFKTIUvjnwy2A7V_N-GhhJH4C-T7F5__QPNofSXbCNyvATj52jGLxk9V0Afvk-Z5QAWi55PJRTC0qteeMRcO2Frw-0KtKYe9o3UcGICJubxhZHsXBLA","token_type":"bearer","id_token":"eyJhbGciOiJSUzI1NiJ9.eyJuYW1lIjoiIiwianRpIjoiMGIyMGI0ODctOTI4OS00YTFhLTgyNmMtM2NiOTg0MDJkMzVkIiwiZXhwIjoxNDQ2ODI4MDU5LCJuYmYiOjAsImlhdCI6MTQ0NjgyNzk5OIwouldhaveToBeNutsUiLCJwcmVmZXJyZWRfdXNlcm5hbWUiOiJhZG1pbiIsImVtYWlsX3ZlcmlmaWVkIjpmYWxzZX0.DmG8Lm4niL1djzNrLsZ2CrsB1ZzUPnR2Nm7IZnrwrmkXsrPxjl6pyXKCWSj6pbk2sgVI8NNFqrGIJmEJ7gkTZWm328VGGpJsmMuJBki0KbqBRKORGQSgkas_34rwzhcTE3Iki8h_YVs2vvNIx_eZSOvIzyEcP3IGHuBoxcR6W3E","not-before-policy":0,"session-state":"62efc05c-1bf5-4f55-b749-5e0eff94155b"}

In case anyone finds this post, this is what I ended up using:

if hash jq 2>/dev/null; then
  # Use the jq command to safely parse json
  bearerToken=$(echo $raw_json | jq -r '.access_token')
else
  # Strip away all but the "access_token" field's value using a perl regular expression
  bearerToken=$(echo $raw_json | grep -Po '"'"access_token"'"\s*:\s*"\K([^"]*)')
fi
qwr
  • 9,525
  • 5
  • 58
  • 102
D-Klotz
  • 1,973
  • 1
  • 15
  • 37
  • 7
    Note that `grep` is not the best (or even a good) tool for working with JSON. Get something like [`jq`](https://stedolan.github.io/jq/) instead, which already knows how to parse JSON. `bearerToken=$(echo "$raw_json" | jq '.accessToken')` is far better. – chepner Nov 06 '15 at 19:17
  • @chepner Thanks. I'm within a vagrant/puppet/centos environment. Perhaps I can yum install jq. – D-Klotz Nov 06 '15 at 19:21
  • @chepner `sudo yum install jq` to the rescue. Thanks! – D-Klotz Nov 06 '15 at 19:27

2 Answers2

117

Since not all regex flavors support lookbehind, Perl introduced the \K. In general when you have:

a\Kb

When “b” is matched, \K tells the engine to pretend that the match attempt started at this position.

In your example, you want to pretend that the match attempt started at what appears after the "access_token":" text.

This example will better demonstrate the \K usage:

~$ echo 'hello world' | grep -oP 'hello \K(world)'
world
~$ echo 'hello world' | grep -oP 'hello (world)'
hello world

In addition, \K allows a variable-length look-behind:

$ echo foooooo bar | grep -oP "(?<=foo+) \Kbar"
grep: lookbehind assertion is not fixed length

$ echo foooooo bar | grep -oP "foo+ \Kbar"
bar
Maroun
  • 94,125
  • 30
  • 188
  • 241
  • Thanks. I'm also curious about the spam of double quotes and single quotes. It does work, but I'm not sure how – D-Klotz Nov 06 '15 at 19:20
  • 4
    The regular expression is a bit overquoted. It starts with `'"'`, which is just a single double-quotation mark in a single-quoted string. Next comes a double-quoted string containing `access_token`; the two strings are simply concatenated together. Finally comes a single-quoted string that contains a few double-quotation marks. The shell concatenates the contents of three strings together; for example, `'foo'"bar"'baz'` represents the same thing as `"foobarbaz"`. The entire thing could be more simply written `'"access_token"\s*:\s*"\K([^"]*)'`. – chepner Nov 06 '15 at 19:32
  • 2
    This seems like the same thing as `\zs` in vim – xdhmoore Jul 17 '19 at 21:24
2

My solution was: sed -n 's/cut off this part \(display this part only\) cut off this part/\1/gp'

References:

  1. https://www.cyberciti.biz/faq/unix-linux-sed-print-only-matching-lines-command/
  2. info sed (texinfo package)
  3. man 1 sed