1247

I'm trying to parse JSON returned from a curl request, like so:

curl 'http://twitter.com/users/username.json' |
    sed -e 's/[{}]/''/g' | 
    awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'

The above splits the JSON into fields, for example:

% ...
"geo_enabled":false
"friends_count":245
"profile_text_color":"000000"
"status":"in_reply_to_screen_name":null
"source":"web"
"truncated":false
"text":"My status"
"favorited":false
% ...

How do I print a specific field (denoted by the -v k=text)?

wejoey
  • 216
  • 1
  • 3
  • 14
auser
  • 13,808
  • 4
  • 21
  • 15
  • 6
    Erm that is not good json parsing btw... what about the escape characters in strings...etc IS there a python answer to this on SO (a perl answer even...)? – martinr Dec 23 '09 at 22:00
  • 1
    The Python answer to this is to simply use a Python JSON library that will actually parse the JSON. sed and AWK provide regular expressions, but those are not a good solution to the problem of correctly parsing JSON. – steveha Dec 29 '09 at 01:04
  • 74
    Any time someone says "problem X can easily be solved with other language Y," that's code for ["my toolbox has only a rock for driving nails... why bother with anything else?"](http://www.codinghorror.com/blog/2012/06/the-php-singularity.html) – BryanH Feb 04 '13 at 16:16
  • 27
    @BryanH: except sometimes language Y *can* be more equipped to solve particular problem X regardless of how many languages the person who suggested Y knows. – jfs May 30 '13 at 14:44
  • 21
    Kinda late, but here it goes. `grep -Po '"'"version"'"\s*:\s*"\K([^"]*)' package.json`. This solves the task easily & only with *grep* and works perfectly for simple JSONs. For complex JSONs you should use a proper parser. – Diosney Nov 17 '14 at 22:14
  • @diosney would you be willing to explain how that python regex works, or point me to a page where I might learn? My google secret decoder ring has failed me so far. Thanks. – D-Klotz Nov 06 '15 at 17:51
  • Would `cut` suite your needs? – jpaugh Nov 18 '15 at 23:42
  • 1
    Is there a way jq could be added to bash tool on windows like Git Bash ? – Vinay Dec 13 '16 at 15:22
  • @diosney, ...if one has GNU grep compiled with libpcre support. Some folks are on BSD platforms, or busybox platforms, or places where GNU grep was compiled without optional features enabled. – Charles Duffy Jul 12 '18 at 19:17
  • 1
    A good list of tools is provided [at this SO question](https://stackoverflow.com/a/49011455/1485527) about 'XSLT equivalent for JSON'. – jschnasse Sep 10 '18 at 11:10
  • https://shapeshed.com/jq-json/ – Damien Jan 29 '19 at 05:16
  • 1
    For that matter to keep it in BASH only, AND ASSUMING no spaces, commas curly braces or escaped character in keys or values: ```curl blah | tr -d '{}"' | tr , \\n | while read key ; do [ "$key"=="text:"] && echo $value``` (yeah, that may have typos, but the approach is sound from the standpoint of not wanting to be dependent on anything.) – Brian Carcich Jun 06 '19 at 21:14
  • d'Oh, my comment's answer is dependent on tr! – Brian Carcich Jun 06 '19 at 21:22
  • 1
    What XPath is to XML, ??? is to JSON. JSON is around long enough and there should be some command line json path tool. If there is no such thing...there is a tool to convert json to xml - then you can use XPath... – BitTickler Jun 22 '20 at 23:34

46 Answers46

1683

There are a number of tools specifically designed for the purpose of manipulating JSON from the command line, and will be a lot easier and more reliable than doing it with Awk, such as jq:

curl -s 'https://api.github.com/users/lambda' | jq -r '.name'

You can also do this with tools that are likely already installed on your system, like Python using the json module, and so avoid any extra dependencies, while still having the benefit of a proper JSON parser. The following assume you want to use UTF-8, which the original JSON should be encoded in and is what most modern terminals use as well:

Python 3:

curl -s 'https://api.github.com/users/lambda' | \
    python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"

Python 2:

export PYTHONIOENCODING=utf8
curl -s 'https://api.github.com/users/lambda' | \
    python2 -c "import sys, json; print json.load(sys.stdin)['name']"

Frequently Asked Questions

Why not a pure shell solution?

The standard POSIX/Single Unix Specification shell is a very limited language which doesn't contain facilities for representing sequences (list or arrays) or associative arrays (also known as hash tables, maps, dicts, or objects in some other languages). This makes representing the result of parsing JSON somewhat tricky in portable shell scripts. There are somewhat hacky ways to do it, but many of them can break if keys or values contain certain special characters.

Bash 4 and later, zsh, and ksh have support for arrays and associative arrays, but these shells are not universally available (macOS stopped updating Bash at Bash 3, due to a change from GPLv2 to GPLv3, while many Linux systems don't have zsh installed out of the box). It's possible that you could write a script that would work in either Bash 4 or zsh, one of which is available on most macOS, Linux, and BSD systems these days, but it would be tough to write a shebang line that worked for such a polyglot script.

Finally, writing a full fledged JSON parser in shell would be a significant enough dependency that you might as well just use an existing dependency like jq or Python instead. It's not going to be a one-liner, or even small five-line snippet, to do a good implementation.

Why not use awk, sed, or grep?

It is possible to use these tools to do some quick extraction from JSON with a known shape and formatted in a known way, such as one key per line. There are several examples of suggestions for this in other answers.

However, these tools are designed for line based or record based formats; they are not designed for recursive parsing of matched delimiters with possible escape characters.

So these quick and dirty solutions using awk/sed/grep are likely to be fragile, and break if some aspect of the input format changes, such as collapsing whitespace, or adding additional levels of nesting to the JSON objects, or an escaped quote within a string. A solution that is robust enough to handle all JSON input without breaking will also be fairly large and complex, and so not too much different than adding another dependency on jq or Python.

I have had to deal with large amounts of customer data being deleted due to poor input parsing in a shell script before, so I never recommend quick and dirty methods that may be fragile in this way. If you're doing some one-off processing, see the other answers for suggestions, but I still highly recommend just using an existing tested JSON parser.

Historical notes

This answer originally recommended jsawk, which should still work, but is a little more cumbersome to use than jq, and depends on a standalone JavaScript interpreter being installed which is less common than a Python interpreter, so the above answers are probably preferable:

curl -s 'https://api.github.com/users/lambda' | jsawk -a 'return this.name'

This answer also originally used the Twitter API from the question, but that API no longer works, making it hard to copy the examples to test out, and the new Twitter API requires API keys, so I've switched to using the GitHub API which can be used easily without API keys. The first answer for the original question would be:

curl 'http://twitter.com/users/username.json' | jq -r '.text'
Chris Noe
  • 36,411
  • 22
  • 71
  • 92
Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • 7
    @thrau +1. jq it is available in the repository and super easy to use so it's much better than jsawk. I tested both for a few minutes, jq won this battle – Szymon Sadło Jun 17 '16 at 09:51
  • 2
    Note that in Python 2, if you are piping the output to another command then the `print` statement will *always* encode to ASCII because you are using Python in a pipe. Insert `PYTHONIOENCODING=` into the command to set a different output encoding, suitable for your terminal. In Python 3, the default is UTF-8 in this case (using the `print()` *function*). – Martijn Pieters Sep 09 '16 at 11:28
  • 7
    Install jq on OSX with **brew install jq** – Andy Fraley Apr 20 '18 at 14:56
  • 5
    `curl -s` is equivalent to `curl --silent`, whereas `jq -r` means `jq --raw-output` i.e. without string quotes. – Serge Stroobandt Oct 26 '18 at 21:52
  • python -c "import requests;r=requests.get('https://api.github.com/users/lambda');print r.json()['name'];" . The simpliest! – NotTooTechy May 15 '20 at 14:28
  • @The simpliest! I have just tried your suggestion: $ python -c "import requests;r=requests.get('api.github.com/users/lambda');print r.json()['name'];" and received this error message: Traceback (most recent call last): File "", line 1, in ImportError: No module named requests – Bernie Reiter Jul 07 '20 at 15:34
  • Why do no commenters consider that no other tools are available? I know curl is there, but I can't install a tool for a single script a user runs. I would consider anything suggesting another language is not a valid answer. I feel the same for using a tool not in a standard linux/macos install. I came looking for a pure shell solution (which seems to be what the original poster wanted), and python is not a solution. Same with jq. – Mark Lilback Sep 29 '20 at 20:22
  • @MarkLilback I've updated my answer with a FAQ about why it's best to used one of these dependencies instead of a pure shell or sed/awk/grep based solution. – Brian Campbell Oct 01 '20 at 18:30
  • I am new to bash and command line. Can you explain why did you use ``| \`` in the line : `curl -s 'https://api.github.com/users/lambda' | \ python3 -c "import sys, json; print(json.load(sys.stdin)['name'])"` –  Oct 08 '20 at 08:59
  • To keep the consistency between the 3 options would be nice to recommend to use `jq -er '.name'`. This way the command will return an error if the key is not found (just like the python options). – Vinicius Nov 09 '21 at 12:02
  • I wasn't able to get python based solutions work with hierarchies although they seem to be able to get values of root elements. `jq` works with hierarchies flawlessly – ka3ak Nov 26 '21 at 15:14
  • In "curl -s 'https://api.github.com/users/lambda' | jq -r '.name'", what is the '.name' part used for? When I run the command I get a "NULL" response. Not sure what is causing that. – Rekless Oct 22 '22 at 02:30
  • jq barfs with keys containing commas, and I haven't found a way to get around that. Python / grep based solutions work fine – Jon Oct 26 '22 at 01:04
  • @Jon - For keys containing characters outside [A-Za-z0-9_], you can simply quote the field name, e.g. `."x,y zzy"' or `.["x,y zzy"]`. – peak Mar 14 '23 at 07:54
  • Another advantage of jq is that there's a highly-compatible Go implementation (gojq), which incidentally also supports indefinite-precision integer arithmetic. That means it's trivially portable to any platform that supports Go. For Rustaceans, jaq will likely be of interest. – peak Mar 14 '23 at 08:00
328

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

grep -Po '"text":.*?[^\\]",' tweets.json

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

  1. To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  2. grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.

Jens
  • 69,818
  • 15
  • 125
  • 179
Brendan OConnor
  • 9,624
  • 3
  • 27
  • 25
  • what if parameter last in tuple thwn at the end no comma but right brace. And +1 for sure. – Yola Nov 12 '11 at 19:11
  • Hi Yola, right, it depends on the input. You have to look at it first. – Brendan OConnor Nov 12 '11 at 22:33
  • 14
    You forgot about integer values. `grep -Po '"text":(\d*?,|.*?[^\\]",)'` – Robert Dec 04 '13 at 01:52
  • 3
    Robert: Right, my regex was written only for string values for that field. Integers could be added as you say. If you want all types, you have to do more and more: booleans, null. And arrays and objects require more work; only depth-limited is possible, under standard regexes. – Brendan OConnor Dec 05 '13 at 02:02
  • 11
    1. [`jq .name`](http://stackoverflow.com/a/16838234/4279) works on the command-line and it doesn't require "opening an editor to write a script". 2. It doesn't matter how fast your regex can produce *wrong* results – jfs Aug 24 '14 at 20:50
  • 10
    and if you only want the values you can just throw awk at it. `| grep -Po '"text":.*?[^\\]",'|awk -F':' '{print $2}'` – JeffCharter Sep 06 '15 at 19:37
  • 55
    It seems that on OSX the `-P` option is missing. I tested on OSX 10.11.5 and `grep --version` was `grep (BSD grep) 2.5.1-FreeBSD`. I got it working with the "extended regex" option on OSX. The command from above would be `grep -Eo '"text":.*?[^\\]",' tweets.json`. – Jens Jun 08 '16 at 13:14
  • 1
    Yeah, `jq` is much better. When I wrote the original post it didn't exist yet or wasn't widespread. – Brendan OConnor Jun 09 '16 at 14:18
  • it does not deal with the value appearing multiple times in the JSON as you can't specify the path that leads to the one you want inside the tree. – Marcus Wolschon Jun 30 '16 at 06:18
  • it worked with me but how can i receive results in a file for this command grep -Po '"text":.*?[^\\]",' tweets.json – user1 Nov 16 '16 at 06:37
  • 1
    @JeffCharter You don't need `awk`. Instead use zero-length assertion like `grep -oP '(?<="id": )[0-9]+'` to get only the integer value – mgutt Aug 02 '19 at 19:20
  • 3
    @JeffCharter or `grep -oP '(?<="text": ").*?[^\\](?=",)'` for string values. – mgutt Aug 02 '19 at 19:27
  • **Important**: I had to use `grep --color=never` on Debian 10 – ᴍᴇʜᴏᴠ Jun 29 '20 at 12:20
  • On OSX, to get just the value itself I ended up with this: `grep -Eo '"text":.*?[^\\]",' tweets.json | sed -e 's/[\"\,\: ]*//g' | sed -e 's/text//g')` – Shanerk Aug 11 '20 at 17:37
  • This solution works OK also in git-bash for Windows (**bash -version GNU bash, version 5.2.15(1)-release (x86_64-pc-msys)** with additional Linux tools from the package, evidently); no need to install other tools – Mache May 25 '23 at 14:45
194

On the basis that some of the recommendations here (especially in the comments) suggested the use of Python, I was disappointed not to find an example.

So, here's a one-liner to get a single value from some JSON data. It assumes that you are piping the data in (from somewhere) and so should be useful in a scripting context.

echo '{"hostname":"test","domainname":"example.com"}' | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["hostname"]'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
paulkmoore
  • 3,253
  • 2
  • 21
  • 20
  • I enhanced this answer below to use a bash function: curl 'some_api' | getJsonVal 'key' – Joe Heyming Apr 11 '14 at 22:06
  • `pythonpy` (https://github.com/russell91/pythonpy is almost always a better alternative to `python -c`, although it does have to be installed with pip. just pipe the json to `py --ji -x 'x[0]["hostname"]'`. If you didn't want to use the built in json_input support, you could still get those import automatically as `py 'json.loads(sys.stdin)[0]["hostname"]'` – RussellStewart Sep 14 '14 at 20:18
  • 3
    Thanks! For more quick&dirty JSON parsing I've wrapped it into a bash function: `jsonq() { python -c "import sys,json; obj=json.load(sys.stdin); print($1)"; }` so that I could write: `curl ...... | jsonq 'json.dumps([key["token"] for key in obj], indent=2)'` & more of similar scary stuff... Btw, `obj[0]` seems unnecessary, it looks like just `obj` works OK in default cases (?). – akavel Mar 23 '15 at 13:05
  • Thanks. I've made this respect JSON a bit better than print: `jsonq() { python -c "import sys,json; obj=json.load(sys.stdin); sys.stdout.write(json.dumps($1))"; }` – Adam K Dean Mar 01 '16 at 17:58
  • 4
    `obj[0]` causes an error when parsing `{ "port":5555 }`. Works fine after removing `[0]`. – CyberEd Aug 16 '16 at 20:49
  • 1
    I get ` File "", line 1 import json,sys;obj=json.load(sys.stdin);print obj["hostname"] ^ SyntaxError: invalid syntax ` when running the example – ka3ak Jul 27 '21 at 11:44
  • 3
    @ka3ak try `print(obj["hostname"])` instead of `print obj["hostname"]` in the end – chill appreciator Nov 03 '21 at 15:22
153

Following martinr's and Boecko's lead:

curl -s 'http://twitter.com/users/username.json' | python -mjson.tool

That will give you an extremely grep-friendly output. Very convenient:

curl -s 'http://twitter.com/users/username.json' | python -mjson.tool | grep my_key
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jnrg
  • 1,871
  • 1
  • 11
  • 7
  • 44
    How would you extract a specific key, as OP is asking? – juan Mar 28 '13 at 13:58
  • 2
    Best answer so far imho, no need to install anything else on most distros and you can `| grep field`. Thanks! – Andrea Richiardi May 12 '13 at 04:31
  • 9
    All this does is format the JSON, if I'm not mistaken. It does not allow the caller to select a particular field from the output, as would an xpath solution, or something based on "JSON Pointer". – Cheeso Jun 04 '14 at 00:42
  • or you could use pythonpy: `curl 'http://twitter.com/users/username.json' | py --json_input -x 'x.name'` – RussellStewart Sep 14 '14 at 20:14
  • 8
    I just end up with a key value pair, but not the value in and of itself. – christopher Dec 28 '17 at 10:15
  • The `grep` without anchors will pick up any *line* which contains the field name. Guess what happens if the field is called something very common and short like `id`. (You can fix that with `grep "\"id\":"` but that still gives you the entire line, not just the value, or just the name of the field if its value is another complex structure.) – tripleee Jun 18 '18 at 12:57
  • 1
    Guys it's 2018, ignore my comment , use jq for sure ;-) – jnrg Aug 01 '18 at 23:27
  • 3
    `jq` is not typically installed while python is. Also, once your in Python you might as well go the whole way and parse it with `import json...` – CpILL Sep 03 '18 at 13:10
151

You could just download jq binary for your platform and run (chmod +x jq):

$ curl 'https://twitter.com/users/username.json' | ./jq -r '.name'

It extracts "name" attribute from the json object.

jq homepage says it is like sed for JSON data.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 2
    Agreed. I can't compare with jsawk from the accepted answer, as I haven't used that, but for local experimentation (where installing a tool is acceptable) I highly recommend jq. Here's a slightly more extensive example, which takes each element of an array and synthesizes a new JSON object with selected data: `curl -s https://api.example.com/jobs | jq '.jobs[] | {id, o: .owner.username, dateCreated, s: .status.state}'` – jbyler Apr 21 '14 at 22:04
  • 3
    Love this. Very light weight, and since it's in plain old C, it can be compiled just about anywhere. – Ben Jacobs Oct 21 '14 at 16:28
  • 1
    The most practical one: it does not need third party libraries (while jsawk does) and is easy to install (OSX: brew install jq) – lauhub Dec 19 '14 at 09:10
  • 3
    This is the most practical and easily implemented answer for my use-case. For Ubuntu (14.04) system a simple apt-get install jq added the tool to my system. I am piping JSON output from AWS CLI responses into jq and it works great to extract values to certain keys nested in the response. – Brandon K May 27 '15 at 15:01
  • 1
    This is *a lot* faster than jsawk, which is what I've recently had a problem with because of the expensive invocations to spidermonkey. – thrau Feb 28 '16 at 19:18
  • 1
    It is my personal favourite for JSON parsing. It allows for complex parsing, filtering and calculation. And it's available on the Debian repositories, and also for Cygwin. – Stéphane Ch. Jun 03 '16 at 13:44
  • http://xmodulo.com/how-to-parse-json-string-via-command-line-on-linux.html – enthusiasticgeek Jul 28 '16 at 16:36
143

Using Node.js

If the system has Node.js installed, it's possible to use the -p print and -e evaluate script flags with JSON.parse to pull out any value that is needed.

A simple example using the JSON string { "foo": "bar" } and pulling out the value of "foo":

node -pe 'JSON.parse(process.argv[1]).foo' '{ "foo": "bar" }'

Output:

bar

Because we have access to cat and other utilities, we can use this for files:

node -pe 'JSON.parse(process.argv[1]).foo' "$(cat foobar.json)"

Output:

bar

Or any other format such as an URL that contains JSON:

node -pe 'JSON.parse(process.argv[1]).name' "$(curl -s https://api.github.com/users/trevorsenior)"

Output:

Trevor Senior
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jay
  • 18,959
  • 11
  • 53
  • 72
  • 1
    thanks! but in my case it's working only with -e flag `node -p -e 'JSON.parse(process.argv[1]).foo' '{ "foo": "bar" }'` – Rnd_d Nov 25 '13 at 23:02
  • 41
    Pipes! `curl -s https://api.github.com/users/trevorsenior | node -pe "JSON.parse(require('fs').readFileSync('/dev/stdin').toString()).name"` – nicerobot May 07 '14 at 19:19
  • 4
    this is my favourite solution; use a language (javascript) to parse a data-structure that is natural to it (JSON). seems the most *correct*. also - node is probably already available on the system, and you won't have to mangle with jq's binaries (which looks like another *correct* choice). – Eliran Malka Mar 01 '17 at 15:05
  • This is the bash script function: # jsonv get the json object value for a specific attribute # first parameter is the json document # second parameter is the attribute which value should be returned get_json_attribute_value() { node -pe 'JSON.parse(process.argv[1])[process.argv[2]]' "$1" "$2" } – Youness Jul 04 '17 at 23:35
  • 11
    The following works with Node.js 10: `cat package.json | node -pe 'JSON.parse(fs.readFileSync(0)).version'` – Ilya Boyandin Oct 15 '18 at 10:21
121

Use Python's JSON support instead of using AWK!

Something like this:

curl -s http://twitter.com/users/username.json | \
    python -c "import json,sys;obj=json.load(sys.stdin);print(obj['name']);"

macOS v12.3 (Monterey) removed /usr/bin/python, so we must use /usr/bin/python3 for macOS v12.3 and later.

curl -s http://twitter.com/users/username.json | \
    python3 -c "import json,sys;obj=json.load(sys.stdin);print(obj['name']);"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
martinr
  • 3,794
  • 2
  • 17
  • 15
  • 8
    Pardon me for trying to come up with a good response...: I shall try harder. Partisanship requires more than writing an awk script to shake it off! – martinr Dec 23 '09 at 22:45
  • 11
    Why do you use the obj variable in that oneliner solution?. It's useless and is not stored anyway at all? You write less using `json.load(sys.stdin)['"key']"` as example like: `curl -sL httpbin.org/ip | python -c "import json,sys; print json.load(sys.stdin)['origin']"`. – m3nda Feb 15 '16 at 07:23
  • `/usr/bin/python` doesn't exist on macOS `12.3`, so this needs to use python3 now. – Heath Borders Apr 12 '22 at 16:40
79

You've asked how to shoot yourself in the foot and I'm here to provide the ammo:

curl -s 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v RS=',"' -F: '/^text/ {print $2}'

You could use tr -d '{}' instead of sed. But leaving them out completely seems to have the desired effect as well.

If you want to strip off the outer quotes, pipe the result of the above through sed 's/\(^"\|"$\)//g'

I think others have sounded sufficient alarm. I'll be standing by with a cell phone to call an ambulance. Fire when ready.

Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
  • 13
    This way madness lies, read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Dennis Williamson Dec 24 '09 at 00:12
  • 3
    I've read all of the answers and this one works perfectly for me without any extra dependencies. +1 – eth0 Jan 26 '15 at 22:39
  • 1
    That's what I was looking for. The only correction - provided sed command for removing quotes did not work for me, I have used sed 's/"//g' instead – AlexHalkin Aug 12 '15 at 12:43
50

Using Bash with Python

Create a Bash function in your .bashrc file:

function getJsonVal () {
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}

Then

curl 'http://twitter.com/users/username.json' | getJsonVal "['text']"

Output:

My status

Here is the same function, but with error checking.

function getJsonVal() {
   if [ \( $# -ne 1 \) -o \( -t 0 \) ]; then
       cat <<EOF
Usage: getJsonVal 'key' < /tmp/
 -- or --
 cat /tmp/input | getJsonVal 'key'
EOF
       return;
   fi;
   python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))";
}

Where $# -ne 1 makes sure at least 1 input, and -t 0 make sure you are redirecting from a pipe.

The nice thing about this implementation is that you can access nested JSON values and get JSON content in return! =)

Example:

echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']['a'][1]"

Output:

2

If you want to be really fancy, you could pretty print the data:

function getJsonVal () {
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1, sort_keys=True, indent=4))";
}

echo '{"foo": {"bar": "baz", "a": [1,2,3]}}' |  getJsonVal "['foo']"
{
    "a": [
        1,
        2,
        3
    ],
    "bar": "baz"
}
Joe Heyming
  • 777
  • 6
  • 11
  • One-liner without the bash function: ``curl http://foo | python -c 'import json,sys;obj=json.load(sys.stdin);print obj["environment"][0]["name"]'`` – Cheeso Jun 04 '14 at 00:53
  • 1
    `sys.stdout.write()` if you want it to work with both python 2 and 3. – Per Johansson Jun 27 '14 at 09:17
  • I'm thinking that it should change to system.stdout.write(obj$1). That way you can say: getJsonVal "['environment']['name']", like @Cheeso 's example – Joe Heyming Jul 07 '14 at 20:14
  • 1
    @Narek In that case, it would look like this: function `getJsonVal() { py -x "json.dumps(json.loads(x)$1, sort_keys=True, indent=4)"; }` – Joe Heyming Sep 22 '16 at 22:57
  • 1
    Re *".bash_rc file"*: Isn't it *".bashrc file"* (without the underscore)? – Peter Mortensen May 01 '22 at 23:09
49

Update (2020)

My biggest issue with external tools (e.g., Python) was that you have to deal with package managers and dependencies to install them.

However, now that we have jq as a standalone, static tool that's easy to install cross-platform via GitHub Releases and Webi (webinstall.dev/jq), I'd recommend that:

Mac, Linux:

curl -sS https://webi.sh/jq | bash

Windows 10:

curl.exe -A MS https://webi.ms/jq | powershell

Cheat Sheet: https://webinstall.dev/jq

Original (2011)

TickTick is a JSON parser written in bash (less than 250 lines of code).

Here's the author's snippet from his article, Imagine a world where Bash supports JSON:

#!/bin/bash
. ticktick.sh

``
  people = {
    "Writers": [
      "Rod Serling",
      "Charles Beaumont",
      "Richard Matheson"
    ],
    "Cast": {
      "Rod Serling": { "Episodes": 156 },
      "Martin Landau": { "Episodes": 2 },
      "William Shatner": { "Episodes": 2 }
    }
  }
``

function printDirectors() {
  echo "  The ``people.Directors.length()`` Directors are:"

  for director in ``people.Directors.items()``; do
    printf "    - %s\n" ${!director}
  done
}

`` people.Directors = [ "John Brahm", "Douglas Heyes" ] ``
printDirectors

newDirector="Lamont Johnson"
`` people.Directors.push($newDirector) ``
printDirectors

echo "Shifted: "``people.Directors.shift()``
printDirectors

echo "Popped: "``people.Directors.pop()``
printDirectors
coolaj86
  • 74,004
  • 20
  • 105
  • 125
  • Is there any way to print this people variable into a json string again ? That would be extremely useful – Thomas Fournet Sep 26 '19 at 13:34
  • 1
    Thanks for install link, that got me. It's super simple. Unpacking obj from array: – Hvitis Jan 31 '21 at 00:42
  • 1
    The link is broken. It now takes you to a malicious site that attempts to run a coin miner in your browser – spuder Nov 04 '21 at 22:05
  • @spuder: What link? There are several. – Peter Mortensen May 01 '22 at 22:45
  • 1
    I just checked the links. Everything looks good to me. My guess is that a bot added junk links and a mod came back and fixed it later. – coolaj86 May 02 '22 at 18:58
  • @w4ldi It's not the 90s anymore. AES-GCM in HTTPS provides both encryption (privacy) and authentication (tamper-proof). Likewise, nowadays shell scripts use functions, so you can't get a cut-off 'rm -rf /' - the function would never load. See https://webinstall.dev/faq/#dont-run-with-shell-pipes. – coolaj86 Feb 03 '23 at 00:33
  • This tool is great. I use it to find a property in spring cloud config: ```VALUE=$(curl --silent http://127.0.0.1:8080/application/profile \ | jq '.propertySources[] | select(.name == "myPropertySource") | .source.MY_PROPERTY_KEY')``` – Ubeogesh Aug 10 '23 at 14:44
34

This is using standard Unix tools available on most distributions. It also works well with backslashes (\) and quotes (").

Warning: This doesn't come close to the power of jq and will only work with very simple JSON objects. It's an attempt to answer to the original question and in situations where you can't install additional tools.

function parse_json()
{
    echo $1 | \
    sed -e 's/[{}]/''/g' | \
    sed -e 's/", "/'\",\"'/g' | \
    sed -e 's/" ,"/'\",\"'/g' | \
    sed -e 's/" , "/'\",\"'/g' | \
    sed -e 's/","/'\"---SEPERATOR---\"'/g' | \
    awk -F=':' -v RS='---SEPERATOR---' "\$1~/\"$2\"/ {print}" | \
    sed -e "s/\"$2\"://" | \
    tr -d "\n\t" | \
    sed -e 's/\\"/"/g' | \
    sed -e 's/\\\\/\\/g' | \
    sed -e 's/^[ \t]*//g' | \
    sed -e 's/^"//'  -e 's/"$//'
}


parse_json '{"username":"john, doe","email":"john@doe.com"}' username
parse_json '{"username":"john doe","email":"john@doe.com"}' email

--- outputs ---

john, doe
johh@doe.com
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
maikel
  • 1,135
  • 1
  • 12
  • 9
  • 2
    This is awesome. But if the JSON string contains more than one email key, the parser will output john@doe.com""john@doe.com – rtc11 Apr 06 '16 at 12:03
  • Doesn't work if there's a dash in the email like jean-pierre@email.com – alexmngn Mar 15 '19 at 14:53
  • 2
    Technically speaking, `sed` and `awk` are not part of the `bash` scripting language — they're external tools. – Gwyneth Llewelyn Jan 12 '21 at 15:42
  • @rtc11 You're right. It's unfortunately not a full blown JSON parser. I've added a warning to the answer. Thanks! – maikel Jan 14 '21 at 19:34
  • @alexmngn Interesting. On my Mac and in Alpine Linux it also works with dashes. Maybe the `sed` and/or `awk` versions differ. – maikel Jan 14 '21 at 19:35
  • 1
    @GwynethLlewelyn You're absolutely right. I corrected the description. Thank you! – maikel Jan 14 '21 at 19:35
  • `echo $1 |` is itself buggy. See [I just assigned a variable, but `echo $variable` shows something else!](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else). Try running `{"key": " * "}` through your original code; the `*` will be replaced with a list of filenames. – Charles Duffy Jan 27 '23 at 18:24
26

Parsing JSON with PHP CLI

It is arguably off-topic, but since precedence reigns, this question remains incomplete without a mention of our trusty and faithful PHP, am I right?

It is using the same example JSON, but let’s assign it to a variable to reduce obscurity.

export JSON='{"hostname":"test","domainname":"example.com"}'

Now for PHP goodness, it is using file_get_contents and the php://stdin stream wrapper.

echo $JSON | php -r 'echo json_decode(file_get_contents("php://stdin"))->hostname;'

Or as pointed out using fgets and the already opened stream at CLI constant STDIN.

echo $JSON | php -r 'echo json_decode(fgets(STDIN))->hostname;'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
nickl-
  • 8,417
  • 4
  • 42
  • 56
18

If someone just wants to extract values from simple JSON objects without the need for nested structures, it is possible to use regular expressions without even leaving Bash.

Here is a function I defined using bash regular expressions based on the JSON standard:

function json_extract() {
  local key=$1
  local json=$2

  local string_regex='"([^"\]|\\.)*"'
  local number_regex='-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-]?[0-9]+)?'
  local value_regex="${string_regex}|${number_regex}|true|false|null"
  local pair_regex="\"${key}\"[[:space:]]*:[[:space:]]*(${value_regex})"

  if [[ ${json} =~ ${pair_regex} ]]; then
    echo $(sed 's/^"\|"$//g' <<< "${BASH_REMATCH[1]}")
  else
    return 1
  fi
}

Caveats: objects and arrays are not supported as values, but all other value types defined in the standard are supported. Also, a pair will be matched no matter how deep in the JSON document it is as long as it has exactly the same key name.

Using the OP's example:

$ json_extract text "$(curl 'http://twitter.com/users/username.json')"
My status

$ json_extract friends_count "$(curl 'http://twitter.com/users/username.json')"
245
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Helder Pereira
  • 5,522
  • 2
  • 35
  • 52
15

Unfortunately the top voted answer that uses grep returns the full match that didn't work in my scenario, but if you know the JSON format will remain constant you can use lookbehind and lookahead to extract just the desired values.

# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="FooBar":")(.*?)(?=",)'
he\"llo
# echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="TotalPages":)(.*?)(?=,)'
33
#  echo '{"TotalPages":33,"FooBar":"he\"llo","anotherValue":100}' | grep -Po '(?<="anotherValue":)(.*?)(?=})'
100
Daniel Sokolowski
  • 11,982
  • 4
  • 69
  • 55
  • 4
    You *never* actually **know** the order of elements in a JSON dictionary. They are, by definition, unordered. This is precisely one of the fundamental reasons why rolling your own JSON parser is a doomed approach. – tripleee Jun 18 '18 at 12:54
14

Version which uses Ruby and http://flori.github.com/json/

< file.json ruby -e "require 'rubygems'; require 'json'; puts JSON.pretty_generate(JSON[STDIN.read]);"

Or more concisely:

< file.json ruby -r rubygems -r json -e "puts JSON.pretty_generate(JSON[STDIN.read]);"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
boecko
  • 2,195
  • 1
  • 12
  • 13
12

This is yet another Bash and Python hybrid answer. I posted this answer, because I wanted to process more complex JSON output, but, reducing the complexity of my bash application. I want to crack open the following JSON object from http://www.arcgis.com/sharing/rest/info?f=json in Bash:

{
  "owningSystemUrl": "http://www.arcgis.com",
  "authInfo": {
    "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
    "isTokenBasedSecurity": true
  }
}

In the following example, I created my own implementation of jq and unquote leveraging Python. You'll note that once we import the Python object from json to a Python dictionary we can use Python syntax to navigate the dictionary. To navigate the above, the syntax is:

  • data
  • data[ "authInfo" ]
  • data[ "authInfo" ][ "tokenServicesUrl" ]

By using magic in Bash, we omit data and only supply the Python text to the right of data, i.e.

  • jq
  • jq '[ "authInfo" ]'
  • jq '[ "authInfo" ][ "tokenServicesUrl" ]'

Note, with no parameters, jq acts as a JSON prettifier. With parameters, we can use Python syntax to extract anything we want from the dictionary including navigating subdictionaries and array elements.

Here are the Bash Python hybrid functions:

#!/bin/bash -xe

jq_py() {
  cat <<EOF
import json, sys
data = json.load( sys.stdin )
print( json.dumps( data$1, indent = 4 ) )
EOF
}

jq() {
  python -c "$( jq_py "$1" )"
}

unquote_py() {
  cat <<EOF
import json,sys
print( json.load( sys.stdin ) )
EOF
}

unquote() {
  python -c "$( unquote_py )"
}

Here's a sample usage of the Bash Python functions:

curl http://www.arcgis.com/sharing/rest/info?f=json | tee arcgis.json
# {"owningSystemUrl":"https://www.arcgis.com","authInfo":{"tokenServicesUrl":"https://www.arcgis.com/sharing/rest/generateToken","isTokenBasedSecurity":true}}

cat arcgis.json | jq
# {
#     "owningSystemUrl": "https://www.arcgis.com",
#     "authInfo": {
#         "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
#         "isTokenBasedSecurity": true
#     }
# }

cat arcgis.json | jq '[ "authInfo" ]'
# {
#     "tokenServicesUrl": "https://www.arcgis.com/sharing/rest/generateToken",
#     "isTokenBasedSecurity": true
# }

cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]'
# "https://www.arcgis.com/sharing/rest/generateToken"

cat arcgis.json | jq '[ "authInfo" ][ "tokenServicesUrl" ]' | unquote
# https://www.arcgis.com/sharing/rest/generateToken
Stephen Quan
  • 21,481
  • 4
  • 88
  • 75
11

There is an easier way to get a property from a JSON string. Using a package.json file as an example, try this:

#!/usr/bin/env bash
my_val="$(json=$(<package.json) node -pe "JSON.parse(process.env.json)['version']")"

We're using process.env, because this gets the file's contents into Node.js as a string without any risk of malicious contents escaping their quoting and being parsed as code.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Alexander Mills
  • 90,741
  • 139
  • 482
  • 817
  • Using string concatenation to substitute values into into a string parsed as code allows arbitrary node.js code to be run, meaning it's exceedingly unsafe to use with random content you got off the Internet. There's a reason safe/best-practice ways to parse JSON in JavaScript don't just evaluate it. – Charles Duffy Jul 12 '18 at 19:18
  • @CharlesDuffy not sure I follow but the JSON.parse call should be safer, as `require()` can actually run foreign code, JSON.parse can't. – Alexander Mills Apr 19 '19 at 17:57
  • That's true if-and-only-if your string is actually injected into the JSON runtime in such a way as to bypass the parser. I don't see the code here doing that reliably. Pull it from an environment variable and pass it to `JSON.parse()` and yes, you're unambiguously safe... but here, the JSON runtime is *receiving* the (untrusted) content in-band with the (trusted) code. – Charles Duffy Apr 19 '19 at 18:32
  • ...similarly, if you have your code read the JSON from file *as a string* and pass that string to `JSON.parse()`, you're safe then too, but that's not happening here either. – Charles Duffy Apr 19 '19 at 18:34
  • To give you a concrete example, run ```printf '`+require("child_process").exec("touch owned")+`' >package.json``` before your first example. Sure, the code throws an error (because I didn't take the time to make it not), but it *also* runs `touch owned`. – Charles Duffy Apr 19 '19 at 18:40
  • I still don't follow, JSON.parse will never execute code, it can only return a string. If you have control over whether the result of the JSON.parse gets run, then it's safe. Otoh, require(x) can run foreign code. – Alexander Mills Apr 19 '19 at 20:22
  • Did you actually run the example I gave and look at whether a file named `owned` exists afterwards? Once it's proved that the exploit works, then we can go into how. – Charles Duffy Apr 19 '19 at 20:25
  • 2
    ...ahh, heck, might as well go into the "how" immediately. The problem is that *you're substituting the shell variable, which you intend to be passed to `JSON.parse()`, into the code*. You're *assuming* that putting literal backticks will keep the contents literal, but that's a completely unsafe assumption, because literal backticks can exist in the file content (and thus the variable), and thus can terminate the quoting and enter an unquoted context where the values are executed as code. – Charles Duffy Apr 19 '19 at 20:27
  • Ok so you're saying because of the backticks it might execute some bash code, so maybe what you're saying is if you want to pass a dynamic string to JSON.parse use something other than backticks? If you have an improvement for the first example definitely lmk b/c it's not that pretty as it is. – Alexander Mills Apr 19 '19 at 20:32
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192124/discussion-between-charles-duffy-and-alexander-mills). – Charles Duffy Apr 19 '19 at 20:32
  • i used it like node -pe "var config = require('./output.json'); console.log(config.response.docs)" #so thank you. – Ugur Kazdal Oct 28 '19 at 17:18
9

Now that PowerShell is cross platform, I thought I'd throw its way out there, since I find it to be fairly intuitive and extremely simple.

curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json

ConvertFrom-Json converts the JSON into a PowerShell custom object, so you can easily work with the properties from that point forward. If you only wanted the 'id' property for example, you'd just do this:

curl -s 'https://api.github.com/users/lambda' | ConvertFrom-Json | select -ExpandProperty id

If you wanted to invoke the whole thing from within Bash, then you'd have to call it like this:

powershell 'curl -s "https://api.github.com/users/lambda" | ConvertFrom-Json'

Of course, there's a pure PowerShell way to do it without curl, which would be:

Invoke-WebRequest 'https://api.github.com/users/lambda' | select -ExpandProperty Content | ConvertFrom-Json

Finally, there's also ConvertTo-Json which converts a custom object to JSON just as easily. Here's an example:

(New-Object PsObject -Property @{ Name = "Tester"; SomeList = @('one','two','three')}) | ConvertTo-Json

Which would produce nice JSON like this:

{
"Name":  "Tester",
"SomeList":  [
                 "one",
                 "two",
                 "three"
             ]

}

Admittedly, using a Windows shell on Unix is somewhat sacrilegious, but PowerShell is really good at some things, and parsing JSON and XML are a couple of them. This is the GitHub page for the cross platform version: PowerShell

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user2233949
  • 2,053
  • 19
  • 22
9

I can not use any of the answers here. Neither jq, shell arrays, declare, grep -P, lookbehind, lookahead, Python, Perl, Ruby, or even Bash, is available.

The remaining answers simply do not work well. JavaScript sounded familiar, but the tin says Nescaffe - so it is a no go, too :) Even if available, for my simple needs - they would be overkill and slow.

Yet, it is extremely important for me to get many variables from the JSON formatted reply of my modem. I am doing it in Bourne shell (sh) with a very trimmed down BusyBox at my routers! There aren't any problems using AWK alone: just set delimiters and read the data. For a single variable, that is all!

awk 'BEGIN { FS="\""; RS="," }; { if ($2 == "login") {print $4} }' test.json

Remember I don't have any arrays? I had to assign within the AWK parsed data to the 11 variables which I need in a shell script. Wherever I looked, that was said to be an impossible mission. No problem with that, either.

My solution is simple. This code will:

  1. parse .json file from the question (actually, I have borrowed a working data sample from the most upvoted answer) and picked out the quoted data, plus

  2. create shell variables from within the awk assigning free named shell variable names.

    eval $( curl -s 'https://api.github.com/users/lambda' | awk ' BEGIN { FS="""; RS="," }; { if ($2 == "login") { print "Login=""$4""" } if ($2 == "name") { print "Name=""$4""" } if ($2 == "updated_at") { print "Updated=""$4""" } }' ) echo "$Login, $Name, $Updated"

There aren't any problems with blanks within. In my use, the same command parses a long single line output. As eval is used, this solution is suited for trusted data only.

It is simple to adapt it to pickup unquoted data. For a huge number of variables, a marginal speed gain can be achieved using else if. Lack of arrays obviously means: no multiple records without extra fiddling. But where arrays are available, adapting this solution is a simple task.

@maikel's sed answer almost works (but I can not comment on it). For my nicely formatted data - it works. Not so much with the example used here (missing quotes throw it off). It is complicated and difficult to modify. Plus, I do not like having to make 11 calls to extract 11 variables. Why? I timed 100 loops extracting 9 variables: the sed function took 48.99 seconds and my solution took 0.91 second! Not fair? Doing just a single extraction of 9 variables: 0.51 vs. 0.02 second.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pila
  • 125
  • 1
  • 5
7

Someone who also has XML files, might want to look at my Xidel. It is a command-line interface, dependency-free JSONiq processor. (I.e., it also supports XQuery for XML or JSON processing.)

The example in the question would be:

 xidel -e 'json("http://twitter.com/users/username.json")("name")'

Or with my own, nonstandard extension syntax:

 xidel -e 'json("http://twitter.com/users/username.json").name'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • 1
    Or simpler nowadays: `xidel -s https://api.github.com/users/lambda -e 'name'` (or `-e '$json/name'`, or `-e '($json).name'`). – Reino Jan 13 '19 at 13:03
6

One interesting tool that hasn't be covered in the existing answers is using gron written in Go which has a tagline that says Make JSON greppable! which is exactly what it does.

So essentially gron breaks down your JSON into discrete assignments see the absolute 'path' to it. The primary advantage of it over other tools like jq would be to allow searching for the value without knowing how nested the record to search is present at, without breaking the original JSON structure

e.g., I want to search for the 'twitter_username' field from the following link, I just do

% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username'
json.twitter_username = "unlambda";
% gron 'https://api.github.com/users/lambda' | fgrep 'twitter_username' | gron -u
{
  "twitter_username": "unlambda"
}

As simple as that. Note how the gron -u (short for ungron) reconstructs the JSON back from the search path. The need for fgrep is just to filter your search to the paths needed and not let the search expression be evaluated as a regex, but as a fixed string (which is essentially grep -F)

Another example to search for a string to see where in the nested structure the record is under

% echo '{"foo":{"bar":{"zoo":{"moo":"fine"}}}}' | gron | fgrep "fine"
json.foo.bar.zoo.moo = "fine";

It also supports streaming JSON with its -s command line flag, where you can continuously gron the input stream for a matching record. Also gron has zero runtime dependencies. You can download a binary for Linux, Mac, Windows or FreeBSD and run it.

More usage examples and trips can be found at the official Github page - Advanced Usage

As for why you one can use gron over other JSON parsing tools, see from author's note from the project page.

Why shouldn't I just use jq?

jq is awesome, and a lot more powerful than gron, but with that power comes complexity. gron aims to make it easier to use the tools you already know, like grep and sed.

Inian
  • 80,270
  • 14
  • 142
  • 161
6

You can try something like this -

curl -s 'http://twitter.com/users/jaypalsingh.json' | 
awk -F=":" -v RS="," '$1~/"text"/ {print}'
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
5

You can use jshon:

curl 'http://twitter.com/users/username.json' | jshon -e text
kev
  • 155,172
  • 47
  • 273
  • 272
  • The site says: "Twice as fast, 1/6th the memory"... and then: "Jshon parses, reads and creates JSON. It is designed to be as usable as possible from within the shell and replaces fragile adhoc parsers made from grep/sed/awk as well as heavyweight one-line parsers made from perl/python. " – Roger Jan 27 '17 at 09:20
  • this is listed as the recommended solution for parsing JSON in Bash – qodeninja Jul 08 '17 at 04:33
  • what's the easiest way to get rid of the quotes around the result? – gMale May 20 '20 at 21:35
5

Here's one way you can do it with AWK:

curl -sL 'http://twitter.com/users/username.json' | awk -F"," -v k="text" '{
    gsub(/{|}/,"")
    for(i=1;i<=NF;i++){
        if ( $i ~ k ){
            print $i
        }
    }
}'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
4

Parsing JSON is painful in a shell script. With a more appropriate language, create a tool that extracts JSON attributes in a way consistent with shell scripting conventions. You can use your new tool to solve the immediate shell scripting problem and then add it to your kit for future situations.

For example, consider a tool jsonlookup such that if I say jsonlookup access token id it will return the attribute id defined within the attribute token defined within the attribute access from standard input, which is presumably JSON data. If the attribute doesn't exist, the tool returns nothing (exit status 1). If the parsing fails, exit status 2 and a message to standard error. If the lookup succeeds, the tool prints the attribute's value.

Having created a Unix tool for the precise purpose of extracting JSON values you can easily use it in shell scripts:

access_token=$(curl <some horrible crap> | jsonlookup access token id)

Any language will do for the implementation of jsonlookup. Here is a fairly concise Python version:

#!/usr/bin/python

import sys
import json

try: rep = json.loads(sys.stdin.read())
except:
    sys.stderr.write(sys.argv[0] + ": unable to parse JSON from stdin\n")
    sys.exit(2)
for key in sys.argv[1:]:
    if key not in rep:
        sys.exit(1)
    rep = rep[key]
print rep
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mcnabicus
  • 41
  • 1
4

A two-liner which uses Python. It works particularly well if you're writing a single .sh file and you don't want to depend on another .py file. It also leverages the usage of pipe |. echo "{\"field\": \"value\"}" can be replaced by anything printing a JSON file to standard output.

echo "{\"field\": \"value\"}" | python -c 'import sys, json
print(json.load(sys.stdin)["field"])'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Adam Kurkiewicz
  • 1,526
  • 1
  • 15
  • 34
4

Here is a good reference. In this case:

curl 'http://twitter.com/users/username.json' | sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) { where = match(a[i], /\"text\"/); if(where) {print a[i]} }  }'
Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
Max Barrass
  • 2,776
  • 1
  • 19
  • 10
  • 1
    this answer should get the highest vote, most if not all of the other answers are package(php, python, etc..) dependent. – Viktova Mar 19 '18 at 12:18
  • 1
    No, on the contrary, anything with a [useless use of `sed`](http://www.iki.fi/era/unix/award.html#grep) should not receive any more upvotes. – tripleee Jun 18 '18 at 13:00
  • SecKarma, Exactly! topic said UNIX tools right? tripleee, got any ON TOPIC sample code for us to review? – Max Barrass Jun 21 '18 at 04:47
4

If you have the PHP interpreter installed:

php -r 'var_export(json_decode(`curl http://twitter.com/users/username.json`, 1));'

For example:

We have a resource that provides JSON content with countries' ISO codes: http://country.io/iso3.json and we can easily see it in a shell with curl:

curl http://country.io/iso3.json

But it looks not very convenient, and not readable. Better parse the JSON content and see a readable structure:

php -r 'var_export(json_decode(`curl http://country.io/iso3.json`, 1));'

This code will print something like:

array (
  'BD' => 'BGD',
  'BE' => 'BEL',
  'BF' => 'BFA',
  'BG' => 'BGR',
  'BA' => 'BIH',
  'BB' => 'BRB',
  'WF' => 'WLF',
  'BL' => 'BLM',
  ...

If you have nested arrays this output will looks much better...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
cn007b
  • 16,596
  • 7
  • 59
  • 74
4

There is also a very simple, but powerful, JSON CLI processing tool, fx.

Example of JSON formatting in Bash terminal

Examples

Use an anonymous function:

echo '{"key": "value"}' | fx "x => x.key"

Output:

value

If you don't pass anonymous function parameter → ..., code will be automatically transformed into an anonymous function. And you can get access to JSON by this keyword:

$ echo '[1,2,3]' | fx "this.map(x => x * 2)"
[2, 4, 6]

Or just use dot syntax too:

echo '{"items": {"one": 1}}' | fx .items.one

Output:

1

You can pass any number of anonymous functions for reducing JSON:

echo '{"items": ["one", "two"]}' | fx "this.items" "this[1]"

Output:

two

You can update existing JSON using spread operator:

echo '{"count": 0}' | fx "{...this, count: 1}"

Output:

{"count": 1}

Just plain JavaScript. There isn't any need to learn new syntax.


Later version of fx has an interactive mode! -

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Anton Medvedev
  • 3,393
  • 3
  • 28
  • 40
  • 8
    If you are promoting your own creation, you need to be explicit about it. See [How not to be a spammer.](/help/promotion) – tripleee Oct 29 '18 at 08:25
4

I needed something in Bash that was short and would run without dependencies beyond vanilla Linux LSB and Mac OS for both Python 2.7 & 3 and handle errors, e.g. would report JSON parse errors and missing property errors without spewing Python exceptions:

json-extract () {
  if [[ "$1" == "" || "$1" == "-h" || "$1" == "-?" || "$1" == "--help" ]] ; then
    echo 'Extract top level property value from json document'
    echo '  Usage: json-extract <property> [ <file-path> ]'
    echo '  Example 1: json-extract status /tmp/response.json'
    echo '  Example 2: echo $JSON_STRING | json-extract status'
    echo '  Status codes: 0 - success, 1 - json parse error, 2 - property missing'
  else
    python -c $'import sys, json;\ntry: obj = json.load(open(sys.argv[2])); \nexcept: sys.exit(1)\ntry: print(obj[sys.argv[1]])\nexcept: sys.exit(2)' "$1" "${2:-/dev/stdin}"
  fi
}
Mike
  • 2,429
  • 1
  • 27
  • 30
3

For more complex JSON parsing, I suggest using the Python jsonpath module (by Stefan Goessner) -

  1. Install it -

    sudo easy_install -U jsonpath
    
  2. Use it -

    Example file.json (from http://goessner.net/articles/JsonPath) -

    { "store": {
        "book": [
          { "category": "reference",
            "author": "Nigel Rees",
            "title": "Sayings of the Century",
            "price": 8.95
          },
          { "category": "fiction",
            "author": "Evelyn Waugh",
            "title": "Sword of Honour",
            "price": 12.99
          },
          { "category": "fiction",
            "author": "Herman Melville",
            "title": "Moby Dick",
            "isbn": "0-553-21311-3",
            "price": 8.99
          },
          { "category": "fiction",
            "author": "J. R. R. Tolkien",
            "title": "The Lord of the Rings",
            "isbn": "0-395-19395-8",
            "price": 22.99
          }
        ],
        "bicycle": {
          "color": "red",
          "price": 19.95
        }
      }
    }
    

    Parse it (extract all book titles with price < 10) -

    cat file.json | python -c "import sys, json, jsonpath; print '\n'.join(jsonpath.jsonpath(json.load(sys.stdin), 'store.book[?(@.price < 10)].title'))"
    

    Will output -

    Sayings of the Century
    Moby Dick
    

    Note: The above command line does not include error checking. For a full solution with error checking, you should create a small Python script, and wrap the code with try-except.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
shlomosh
  • 59
  • 1
  • 4
  • I was having a little trouble installing `jsonpath` so installed `jsonpath_rw` instead, so here is something similar you can try if the above doesn't work: 1) `/usr/bin/python -m pip install jsonpath-rw` 2) `cat ~/trash/file.json | /usr/bin/python -c "from jsonpath_rw import jsonpath, parse; import sys,json; jsonpath_expr = parse('store.book[0]'); out = [match.value for match in jsonpath_expr.find(json.load(sys.stdin))]; print out;"` (I used the full path to the python binary because I was having some issues with multiple pythons installed). – Sridhar Sarnobat Aug 20 '16 at 05:27
3

This is a good usecase for pythonpy:

curl 'http://twitter.com/users/username.json' | py 'json.load(sys.stdin)["name"]'
RussellStewart
  • 5,293
  • 3
  • 26
  • 23
3

If pip is avaiable on the system then:

$ pip install json-query

Examples of usage:

$ curl -s http://0/file.json | json-query
{
    "key":"value"    
}

$ curl -s http://0/file.json | json-query my.key
value

$ curl -s http://0/file.json | json-query my.keys.
key_1
key_2
key_3

$ curl -s http://0/file.json | json-query my.keys.2
value_2
3

Here is the answer for shell nerds using the POSIX shell (with local) and egrep: JSON.sh, 4.7 KB.

This thing has plenty of test cases, so it should be correct. It is also pipeable. It is used in the package manager for Bash, bpkg.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mingye Wang
  • 1,107
  • 9
  • 32
3

This works for me if Node.js is installed:

node -pe "require('${HOME}/.config/dev-utils.json').doToken"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
jasenmichael
  • 388
  • 5
  • 17
3

Parse using Ruby (the interpreter is available on all macOS versions by default in /usr/bin/ruby):

echo "${JSON}" | /usr/bin/ruby -e 'require "json"; puts JSON.parse(http://STDIN.read)["key1"]["nested_key_2"];'
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Heath Borders
  • 30,998
  • 16
  • 147
  • 256
2

I've done this, "parsing" a JSON response for a particular value, as follows:

curl $url | grep $var | awk '{print $2}' | sed s/\"//g

Clearly, $url here would be the Twitter URL, and $var would be "text" to get the response for that variable.

Really, I think the only thing I'm doing the OP has left out is grep for the line with the specific variable he seeks. AWK grabs the second item on the line, and with sed I strip the quotes.

Someone smarter than I am could probably do the whole think with AWK or grep.

Now, you could do it all with just sed:

curl $url | sed '/text/!d' | sed s/\"text\"://g | sed s/\"//g | sed s/\ //g

Thus, no AWK, no grep...I don't know why I didn't think of that before. Hmmm...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
tonybaldwin
  • 171
  • 1
  • 10
  • Actually, with sed you can do – tonybaldwin Dec 10 '12 at 04:16
  • 2
    The `grep | awk | sed` and `sed | sed | sed` pipelines are wasteful antipatterns. Your last example can easily be rewritten into `curl "$url" | sed '/text/!d;s/\"text\"://g;s/\"//g;s/\ //g'` but like others have pointed out, this is and error-prone and brittle approach which should not be recommended in the first place. – tripleee Dec 04 '14 at 10:10
  • I had to use grep -oPz 'name\":\".*?\"' curloutput | sed 's/name\":/\n/g' – Ferroao Aug 20 '19 at 21:27
2

I used this to extract the video duration from ffprobe JSON output:

MOVIE_INFO=`ffprobe "path/to/movie.mp4"  -show_streams -show_format -print_format json -v quiet`
MOVIE_SECONDS=`echo "$MOVIE_INFO"|grep -w \"duration\" |tail -1 | cut -d\" -f4 |cut -d \. -f 1`

It can be used to extract a value from any JSON file:

value=`echo "$jsondata" | grep -w \"key_name\" |tail -1 | cut -d\" -f4
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ehsan Chavoshi
  • 681
  • 6
  • 10
2

If you are looking for a Native Mac solution to parse JSON (No External Libraries etc...), then this is for you.

This information is based on an article here: https://www.macblog.org/parse-json-command-line-mac/

In short, since as far back as Mac OS Yosemite there is tool for running apple scripts called osascript, however if you pass the -l 'Javascript' flag you can run javascript! Using JXA (JavaScript for Automation) is what this is called.

An example below of reading a JSON file for my own project.

DCMTK_JSON=$(curl -s https://formulae.brew.sh/api/bottle/dcmtk.json) # -s for silent mode
read -r -d '' JXA <<EOF
function run() {
  var json = JSON.parse(\`$DCMTK_JSON\`);
  return json.bottles.$2.url;
}
EOF
DOWNLOAD_URL=$( osascript -l 'JavaScript' <<< "${JXA}" )
echo "DOWNLOAD_URL=${DOWNLOAD_URL}"

What is happening here is we are storing the ouput of the function into the variable JXA. We can then simply run javascript to parse the JSON content with JSON.parse(). Then simply pass in the JXA variable that contains the script to the osascript tool so it can run the javascript. In my example the $2 refers to arm64_monterey if you test this. The reason the javascript runs right away is because of the special function run(), which JXA looks for and will return its output when it finishes.

Note that EOF (end of file) are used to handle multiple lines of text input, and the ending EOF can not have any spaces in front of it to work.

You can test if this will work for you by simply opening terminal and typing the command below

osascript -l 'JavaScript' -e 'var app = Application.currentApplication(); app.includeStandardAdditions = true; app.displayDialog("Hello from JavaScript!");

This should bring up a pop up window that says hello from javascript

Joseph Astrahan
  • 8,659
  • 12
  • 83
  • 154
1

You can use bashJson

It’s a wrapper for the Python's JSON module and can handle complex JSON data.

Let's consider this exmaple JSON data from the file test.json

{
    "name":"Test tool",
    "author":"hack4mer",
    "supported_os":{
        "osx":{
            "foo":"bar",
            "min_version" : 10.12,
            "tested_on" : [10.1,10.13]
        },
        "ubuntu":{
            "min_version":14.04,
            "tested_on" : 16.04
        }
    }
}

Following commands read data from this example JSON file

./bashjson.sh test.json name

Prints: Test Tool

./bashjson.sh test.json supported_os osx foo

Prints: bar

./bashjson.sh test.json supported_os osx tested_on

Prints: [10.1,10.13]

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Anand Singh
  • 1,091
  • 13
  • 21
1

Here is a simple approach for a Node.js-ready environment:

curl -L https://github.com/trentm/json/raw/master/lib/json.js > json
chmod +x json
echo '{"hello":{"hi":"there"}}' | ./json "hello.hi"
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Liu Hao
  • 511
  • 5
  • 10
1

You have multiple options. You can use trdsql [1] for parsing and transforming JSON/CSV input. Following your example;

trdsql "select attr1,attr2 from sample.json"

You can also use where clause just like in SQL. Output in CSV, JSON etc. Very handy tool.

To my experience trdsql was a bit problematic when dealing with attribute nested values so I came to a solution using qp [2] when appropriate.

cat sample.json | qp 'select attr1, attr2.detail.name where attr3=10'

Notice there is no FROM.

For viewing the results you may use the ultra fast command line json viewer tool, jless to view the output [3].

There is a new kid on the block from Clickhouse. You can see what it is capable of from [4].

  1. https://github.com/noborus/trdsql
  2. https://jless.io
  3. https://github.com/f5io/qp
  4. https://clickhouse.com/blog/extracting-converting-querying-local-files-with-sql-clickhouse-local
ᐅdevrimbaris
  • 718
  • 9
  • 20
1

YAML processor yq

Consider using yq for JSON processing. yq is a lightweight and portable command-line YAML processor (and JSON is a subset of YAML). The syntax is similar to jq.

Input

{
  "name": "Angel",
  "address": {
    "street": "Stairway to",
    "city": "Heaven"
  }
}

usage example 1

yq e '.name' $FILE

Angel

usage example 2

yq has a nice builtin feature to make JSON and YAML grep-able

yq --output-format props $FILE

name = Angel
address.street = Stairway to
address.city = Heaven

jpseng
  • 1,618
  • 6
  • 18
0

Niet is a tool that helps you to extract data from a JSON or YAML file directly in your shell or Bash CLI.

pip install niet

Consider a JSON file named project.json with the following contents:

{
  project: {
    meta: {
      name: project-sample
    }
}

You can use Niet like this:

PROJECT_NAME=$(niet project.json project.meta.name)
echo ${PROJECT_NAME}

Output:

project-sample
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Herve
  • 127
  • 4
0

Using PHP after yum install php-cli:

php -r " foreach(json_decode(file_get_contents('http://a.com/a.json'), true) as \$key => \$value) echo \$key.'='.\$value.\"\n\" ; "
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Henry
  • 2,870
  • 1
  • 25
  • 17
0
pip3 install jq

parse() {
  key=$1

  python3 -c "
import sys
import jq
import json

input = json.load(sys.stdin)
output = jq.compile('$key').input(input).all()

if(isinstance(output, list)):
    output = ' '.join(output)

print(output)
"
}

name=$(aws emr describe-cluster --cluster-id $id | parse ".Cluster.Name")

echo $name
seunggabi
  • 1,699
  • 12
  • 12