19

I'm trying to parse a JSON object within a shell script into an array.

e.g.: [Amanda, 25, http://mywebsite.com]

The JSON looks like:

{
  "name"       : "Amanda", 
  "age"        : "25",
  "websiteurl" : "http://mywebsite.com"
}

I do not want to use any libraries, it would be best if I could use a regular expression or grep. I have done:

myfile.json | grep name

This gives me "name" : "Amanda". I could do this in a loop for each line in the file, and add it to an array but I only need the right side and not the entire line.

codeforester
  • 39,467
  • 16
  • 112
  • 140
unconditionalcoder
  • 723
  • 1
  • 13
  • 25
  • 4
    Use `jq` for this. – sjsam Jul 14 '16 at 02:04
  • Have a look at [\[ this \]](http://unix.stackexchange.com/questions/177843/parse-one-field-from-an-json-array-into-bash-array) question and show us some effort on your part to solve this. – sjsam Jul 14 '16 at 02:12
  • 1
    This `cat myfile.json | grep name | cut -d ':' -f2` might help. – Rae Burawes Jul 14 '16 at 04:20
  • 2
    @sjsam: The accepted answer to the linked question demonstrates `jq` use well, but uses a misguided approach to reading its output into a shell array (as least as of this writing - comment posted). – mklement0 Jul 14 '16 at 05:29
  • 2
    I'm assuming instead of `[Amanda, 25, http://mywebsite.com]` you meant `( "Amanda" 25 "http://mywebsite.com")`; the latter is what bash's array syntax actually looks like. (Or, as given with `declare -p array`, this could also be printed as follows: `declare -a array='([0]="Amanda" [1]="25" [2]="http://mywebsite.com")'`) – Charles Duffy Jul 14 '16 at 13:19
  • ...if you want isn't a bash array but some other language's idea of an array, the question should make that explicit. – Charles Duffy Jul 14 '16 at 13:20
  • @sjsam, ...btw, thank you for the pointer -- I made an effort to add a non-buggy (well, less-buggy; still can't handle embedded newlines in content, but all our answers here have the same problem) answer to the question you linked. – Charles Duffy Oct 05 '16 at 03:04

4 Answers4

23

If you really cannot use a proper JSON parser such as jq[1] , try an awk-based solution:

Bash 4.x:

readarray -t values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)

Bash 3.x:

IFS=$'\n' read -d '' -ra values < <(awk -F\" 'NF>=3 {print $4}' myfile.json)

This stores all property values in Bash array ${values[@]}, which you can inspect with
declare -p values.

These solutions have limitations:

  • each property must be on its own line,
  • all values must be double-quoted,
  • embedded escaped double quotes are not supported.

All these limitations reinforce the recommendation to use a proper JSON parser.


Note: The following alternative solutions use the Bash 4.x+ readarray -t values command, but they also work with the Bash 3.x alternative, IFS=$'\n' read -d '' -ra values.

grep + cut combination: A single grep command won't do (unless you use GNU grep - see below), but adding cut helps:

readarray -t values < <(grep '"' myfile.json | cut -d '"' -f4)

GNU grep: Using -P to support PCREs, which support \K to drop everything matched so far (a more flexible alternative to a look-behind assertion) as well as look-ahead assertions ((?=...)):

readarray -t values < <(grep -Po ':\s*"\K.+(?="\s*,?\s*$)' myfile.json)

Finally, here's a pure Bash (3.x+) solution:

What makes this a viable alternative in terms of performance is that no external utilities are called in each loop iteration; however, for larger input files, a solution based on external utilities will be much faster.

#!/usr/bin/env bash

declare -a values # declare the array                                                                                                                                                                  

# Read each line and use regex parsing (with Bash's `=~` operator)
# to extract the value.
while read -r line; do
  # Extract the value from between the double quotes
  # and add it to the array.
  [[ $line =~ :[[:blank:]]+\"(.*)\" ]] && values+=( "${BASH_REMATCH[1]}" )
done < myfile.json                                                                                                                                          

declare -p values # print the array

[1] Here's what a robust jq-based solution would look like (Bash 4.x):
readarray -t values < <(jq -r '.[]' myfile.json)

mklement0
  • 382,024
  • 64
  • 607
  • 775
4

jq is good enough to solve this problem

paste -s <(jq '.files[].name' YourJsonString) <(jq '.files[].age' YourJsonString) <( jq '.files[].websiteurl' YourJsonString) 

So that you get a table and you can grep any rows or awk print any columns you want

Code42
  • 2,292
  • 1
  • 17
  • 22
  • OP literally said no libraries, there are a million other questions with JQ as the answer already. – Alex Oct 02 '22 at 20:58
2

You can use a sed one liner to achieve this:

array=( $(sed -n "/{/,/}/{s/[^:]*:[[:blank:]]*//p;}" json ) )

Result:

$ echo ${array[@]}
"Amanda" "25" "http://mywebsite.com"

If you do not need/want the quotation marks then the following sed will do away with them:

array=( $(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json) )

Result:

$ echo ${array[@]}
Amanda 25 http://mywebsite.com

It will also work if you have multiple entries, like

$ cat json
{
  "name"       : "Amanda" 
  "age"        : "25"
  "websiteurl" : "http://mywebsite.com"
}

{
   "name"       : "samantha"
   "age"        : "31"
   "websiteurl" : "http://anotherwebsite.org"
}

$ echo ${array[@]}
Amanda 25 http://mywebsite.com samantha 31 http://anotherwebsite.org

UPDATE:

As pointed out by mklement0 in the comments, there might be an issue if the file contains embedded whitespace, e.g., "name" : "Amanda lastname". In this case Amanda and lastname would both be read into seperate array fields each. To avoid this you can use readarray, e.g.,

readarray -t array < <(sed -n '/{/,/}/{s/[^:]*:[^"]*"\([^"]*\).*/\1/p;}' json2)

This will also take care of any globbing issues, also mentioned in the comments.

  • 3
    Please don't parse command output into an array with `array=( $(...) )` (even though it happens to work with the sample input): it doesn't work as intended with embedded whitespace and can result in accidental globbing. – mklement0 Jul 14 '16 at 05:33
  • @mklement0 Can you give an example of how the contents of sample file would have to look like for an accidental globbing to occur? –  Jul 14 '16 at 05:39
  • To see what your approach does to embedded whitespace, examine the array that results from `array=( $(echo ' a b ') )`; to see the effects of accidental globbing, try `array=( $(echo 'a * is born') )`. – mklement0 Jul 14 '16 at 05:42
  • For simplicity, try `"*"` as the JSON property value; focusing on the JSON is a distraction, though, as my `echo` commands are sufficient to demonstrate the problem: the output from the command substitution, _whatever the specific command happens to be_, is invariably subject to word splitting and globbing. The larger point is: reading items into an array this way is an _antipattern_ that is best avoided altogether. (You could work around the issues with `IFS=` and `set -f`, but at that point it's simpler to use `readarray`.) – mklement0 Jul 14 '16 at 06:06
  • 1
    @mklement0 I am not sure why the globbing did not match anything previously. Probably because I manipulated `IFS` during testing at some point. However, after restarting the shell the globbing did actually happen. I will update my answer to address this issue. Thanks. –  Jul 14 '16 at 06:24
  • 1
    Please consider editing your correction to actually flow with the answer rather than being an addendum at the end; otherwise, someone trying to follow this answer is more likely to use the buggy code than not. – Charles Duffy Jul 14 '16 at 13:15
  • 1
    (`echo ${array[@]}` is also bad form -- even if `array=( "Hello" "Test * Example" "World" )`, it won't print that as three separate elements despite the contents being correctly stored that way. Consider `printf '%s\n' "${array[@]}"`, *with the quotes*). – Charles Duffy Jul 14 '16 at 13:17
0

Pure Bash 3.x+ without dependencies (such as jq, python, grep, etc.):

source <(curl -s -L -o- https://github.com/lirik90/bashJsonParser/raw/master/jsonParser.sh)
read -d '' JSON << EOF
{
  "name"       : "Amanda", 
  "age"        : "25",
  "websiteurl" : "http://mywebsite.com"
}
EOF

JSON=$(minifyJson "$JSON")
name=$(parseJson "$JSON" name)
age=$(parseJson "$JSON" age)
url=$(parseJson "$JSON" websiteurl)
echo "Result: [$name,$age,$url]"

Output:

Result: [Amanda,25,http://mywebsite.com]

Try it.

lirik90
  • 157
  • 7