2

Let's consider an a.txt file containing the following JSON document :

{ "body": { "session_info": { "session_id": "BAzcWu2nHVXrXrx096PMZOaFslgWrjx1", "email": "admin@site.com" }, "status": { "msg": "success" } } 

I'm writing a bash script for which I need to extract the session_id value. I started grep'ing with the following regexes, with no success (nothing is returned) :

#!/bin/bash
regex="session_id\": \"[A-Z0-9a-z]{32}.*"
echo "REGEX=$regex"
echo "----"
content=$(cat a.txt)
echo $content
echo "----"
[[ $content =~ $regex ]]
sessionid="${BASH_REMATCH[1]}"
echo ${sessionid}

What is wrong with this ?

SCO
  • 1,832
  • 1
  • 24
  • 45

3 Answers3

7

Easier to do it using grep -oP:

grep -oP '"session_id": "\K[A-Z0-9a-z]{32}' file.json
BAzcWu2nHVXrXrx096PMZOaFslgWrjx1

However for parsing JSON better to use command line JSON parsing tool: jq

  • -P To use PCRE regex
  • \K reset the matched info in regex

Likely reason why your regex is not working because you are not grouping anything and trying to use: "${BASH_REMATCH[1]}" which refers to first captured group.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Upvoted since your solution works. However it'd be even more valuable if you'd explain why question's solution doesn't work, and if you'd explain your solution ;). jq seems a great tool, will try it ! Thank you ! – SCO Feb 05 '15 at 08:55
  • I added some explanation in question. – anubhava Feb 05 '15 at 09:01
  • If you on macos, the `-P` argument does not work. You need to use perl directly or install grep with the perl extensions. See the answer [here](http://stackoverflow.com/questions/16658333/grep-p-no-longer-works-how-can-i-rewrite-my-searches) for more details. – leogdion Jan 24 '17 at 21:42
3

Just to show you how you could do this using jq:

$ content='{ "body": { "session_info": { "session_id": "BAzcWu2nHVXrXrx096PMZOa FslgWrjx1", "email": "admin@site.com" }, "status": { "msg": "success" } } }'
$ jq '.body.session_info.session_id' <<< "$content"
"BAzcWu2nHVXrXrx096PMZOaFslgWrjx1"

Simply filter down through the keys to get to the value you want.

You can use jq -r to remove the quotes from the output:

$ jq -r '.body.session_info.session_id' <<< "$content"
BAzcWu2nHVXrXrx096PMZOaFslgWrjx1

I added the missing } from the end of $content, as it wasn't valid JSON to begin with. The added advantage of jq is that it tells you that.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
2

You will get exactly what you want by using the following regex instead of what you use now.

session_id\": \"([A-Z0-9a-z]{32})

${BASH_REMATCH[1]} is the first parenthesided match, you get nothing because you did use any parenthesis in your original regex.

Hammer
  • 1,514
  • 2
  • 14
  • 22