36

I am trying to use jq to parse information from the TVDB api. I need to pull a couple of fields and assign the values to variables that I can continue to use in my bash script. I know I can easily assign the output to one variable through bash with variable="$(command)" but I need the output to produce multiple variables and I don't want to make to use multiple commands.

I read this documentation:

https://stedolan.github.io/jq/manual/v1.5/#Advancedfeatures

but I don't know if this relevant to what I am trying to do.

jq '.data' produces the following output:

[
  {
    "absoluteNumber": 51,
    "airedEpisodeNumber": 6,
    "airedSeason": 4,
    "airedSeasonID": 680431,
    "dvdEpisodeNumber": 6,
    "dvdSeason": 4,
    "episodeName": "We Will Rise",
    "firstAired": "2017-03-15",
    "id": 5939660,
    "language": {
      "episodeName": "en",
      "overview": "en"
    },
    "lastUpdated": 1490769062,
    "overview": "Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team."
  }
]

I tried jq '.data | {episodeName:$name}' and jq '.data | .episodeName as $name' just to try and get one working. I don't understand the documentation or even if it's what I'm looking for. Is there a way to do what I am trying to do?

agc
  • 7,973
  • 2
  • 29
  • 50
user2328273
  • 868
  • 3
  • 12
  • 22
  • Can you post the complete `JSON` and the actual fields needed? – Inian Apr 08 '17 at 07:39
  • Agreed, the current `jq` docs are not user-friendly. SO's own list of [questions tagged `jq` and ranked by votes](http://stackoverflow.com/questions/tagged/jq?sort=votes&pageSize=50) may help. – agc Apr 08 '17 at 09:24
  • `.foo as $var` creates a **jq** variable. That variable doesn't last beyond the point in time when `jq` exits. If you want a **bash** variable, you need to do that with... well... *bash* facilities. – Charles Duffy Apr 08 '17 at 22:28
  • I would start considering if a language other than `bash` might be more appropriate. – chepner Apr 08 '17 at 22:46
  • I am limited to what is available on the server. – user2328273 Apr 09 '17 at 18:46

3 Answers3

55

You can use separate variables with read :

read var1 var2 var3 < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' | 
     jq -r '.id, .name, .full_name'))

echo "id        : $var1"
echo "name      : $var2"
echo "full_name : $var3"

Using array :

read -a arr < <(echo $(curl -s 'https://api.github.com/repos/torvalds/linux' | 
     jq -r '.id, .name, .full_name'))

echo "id        : ${arr[0]}"
echo "name      : ${arr[1]}"
echo "full_name : ${arr[2]}"

Also you can split output with some character :

IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data | 
    map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] | 
    join("|")) | join("\n")')

Or use an array like :

set -f; IFS='|' data=($(curl '......' | jq -r '.data | 
    map([.absoluteNumber, .airedEpisodeNumber, .episodeName, .overview] | 
    join("|")) | join("\n")')); set +f

absoluteNumber, airedEpisodeNumber, episodeName & overview are respectively ${data[0]}, ${data[1]}, ${data[2]}, ${data[3]}. set -f and set +f are used to respectively disable & enable globbing.

For the part, all your required fields are mapped and delimited with a '|' character with join("|")

If your are using jq < 1.5, you'll have to convert Number to String with tostring for each Number fields eg:

IFS='|' read var1 var2 var3 var4 < <(curl '......' | jq -r '.data | 
    map([.absoluteNumber|tostring, .airedEpisodeNumber|tostring, .episodeName, .overview] | 
    join("|")) | join("\n")')
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • 1
    I think that's what I'm looking for. I didn't know I could do it from the bash side like that. However, both suggestions give me the following error: `jq: error (at :26): string ("") and number (51) cannot be added` Seems to be a number/string issue. It works if I use strings like `episodeName` and `overview`, but then the variable values are wrong because they are parsed on a space and there are spaces. `episodeName` and `overveiw` are ones I need so those need to work. – user2328273 Apr 08 '17 at 17:59
  • I still get that error if using one of the fields that returns a number instead of a string. I don't need any of those fields for my purposes but just mentioning it for future reference for others. Thanks for the help. I was approaching this all wrong, trying to solve it with jq instead of the shell. – user2328273 Apr 08 '17 at 23:34
  • 1
    Also, what's the difference between `< <(curl...` and how you had it before, `<<< $(curl ...`? – user2328273 Apr 08 '17 at 23:40
  • 2
    I've updated my post with string conversion for jq < 1.5. `< <(...)` is process substitution and for `<<<` the string at the right is expanded. Please see [this post](http://askubuntu.com/a/678919/463299) for full explanation – Bertrand Martel Apr 08 '17 at 23:59
  • The `read` approach is certainly better practice than the `data=($(curl ...))` one -- what if one of the fields you were reading contained `*`? Even a name like `Foo [Bar]` is syntactically a glob expression, and would cause a failure on `globfail`, or evaluate to an empty string with `nullglob`. – Charles Duffy Apr 09 '17 at 15:11
  • 1
    Thank you for pointing that, I've added `set -f` before the operation to disable globbing, and `set +f` just after to re-enable it – Bertrand Martel Apr 09 '17 at 18:15
  • I am facing an issue with read, my return string looks like this: "2.3.0 Runner - Core tests". The part "2.3.0" is getting separated from the rest and is written to the second variable. Any idea how to avoid that? – Basti May 07 '20 at 10:00
  • The first snippet above spared me an undesirable dependency on another language in a case where I already had `bash`. I tried to copy-paste it for 45 minutes before a coworker explained why the enclosing `echo $(...)` is necessary. In my case, I knew that the values contained no spaces, so I could leave out the echo and add one further transformation, i.e. `jq -r '[ .id, .name, .full_name ] | join(" ")'`. Thanks for your meticulous breakdown! – AbuNassar Jan 26 '23 at 16:00
7

jq always produces a stream of zero or more values. For example, to produce the two values corresponding to "episodeName" and "id"' you could write:

.data[] | ( .episodeName, .id )

For your purposes, it might be helpful to use the -c command-line option, to ensure each JSON output value is presented on a single line. You might also want to use the -r command-line option, which removes the outermost quotation marks from each output value that is a JSON string.

For further variations, please see the jq FAQ https://github.com/stedolan/jq/wiki/FAQ, e.g. the question:

Q: How can a stream of JSON texts produced by jq be converted into a bash array of corresponding values?

peak
  • 105,803
  • 17
  • 152
  • 177
-2

Experimental conversion of quoted OP input, (tv.dat), to a series of bash variables, (and an array). The jq code is mostly borrowed from here and there, but I don't know how to get jq to unroll an array within an array, so the sed code does that, (that's only good for one level, but so are bash arrays):

jq -r ".[] | to_entries | map(\"DAT_\(.key) \(.value|tostring)\") | .[]" tv.dat | 
while read a b ; do echo "${a,,}='$b'" ; done |
sed -e '/{.*}/s/"\([^"]*\)":/[\1]=/g;y/{},/() /' -e "s/='(/=(/;s/)'$/)/"

Output:

dat_absolutenumber='51'
dat_airedepisodenumber='6'
dat_airedseason='4'
dat_airedseasonid='680431'
dat_dvdepisodenumber='6'
dat_dvdseason='4'
dat_episodename='We Will Rise'
dat_firstaired='2017-03-15'
dat_id='5939660'
dat_language=([episodeName]="en" [overview]="en")
dat_lastupdated='1490769062'
dat_overview='Clarke and Roan must work together in hostile territory in order to deliver an invaluable asset to Abby and her team.'
agc
  • 7,973
  • 2
  • 29
  • 50
  • Imaginative, but `eval`ing data received off the Internet munged through a `sed` script strikes me as a recipe for security vulnerabilites, particularly when anyone who reads StackOverflow knows the `sed` script you're using. :) – Charles Duffy Apr 08 '17 at 22:31
  • ...now, if you were just generating `key=value` pairs without the array syntax, you could feed it into `declare -A vars=( ); while IFS== read -r key value; do vars[$key]=$value; done` or such safely. – Charles Duffy Apr 08 '17 at 22:32
  • `'$b'` is *absolutely* not safe -- a value containing literal single-quotes can trivially escape it. (Hence `touch $'$(rm -rf $HOME)\'$(rm -rf $HOME)\''` as my usual example of creating a malicious filename that avoids naive attempts at escaping). – Charles Duffy Apr 08 '17 at 22:35
  • @CharlesDuffy, Re `eval`: There's no `eval` in this answer. (Though I do approve of a judicious use of `eval` more than some.) – agc Apr 09 '17 at 14:38
  • Granted that there isn't an `eval`, but the output is in a format which appears to be anticipating `eval`, `source`, or some equivalent to actually load those variables into a running shell. – Charles Duffy Apr 09 '17 at 15:09
  • @CharlesDuffy, Re "*absolutely*": we seem to be of different schools here. With due respect to defensive programming practices, this answer is specific to TVDB data, (which should not contain anything like that), and therefore is not intended as a universal defensive-programming answer. If we suppose TVDB data is a likely to actually be an attack vector, then TVDB should be fixed, or not used at all. – agc Apr 10 '17 at 16:08
  • @CharlesDuffy, That malicious `touch` code might not be within my current view of this answer's scope, but it's interesting code either way. Please provide a link to a more thorough description of that code, or something like it. – agc Apr 10 '17 at 16:13
  • If TVDB has a security breach, do you really want that to be *your* problem rather than theirs? And then there are MITM attacks. (Granted, the only MITM attack I've actually been part of was an April Fool's joke almost 20 years ago, translating web pages requested by a specific system into pig latin... actually, no, several more recently, mostly intercepting and rewriting network calls made by games in attempts to "cheat" / avoid grinding). Re: the code above, I don't have a good link handy, but why not change it from `rm` to `touch` and play with it yourself? – Charles Duffy Apr 10 '17 at 16:21
  • Defense-in-depth exists for a reason: Small security breaches get escalated into big ones. If I've 0wned the web proxy at a company's boundary, then looking for places where they're, say, downloading and executing shell scripts is prime territory. And if I *know* they're running code that mishandles data from TVDB (maybe because someone there asked a question on StackOverflow and accepted a less-than-cautious answer -- or maybe because I have read-only access to -- or an ex-employee's snapshot of -- their source control and spotted some vulnerable code), yes, I might target that too. – Charles Duffy Apr 10 '17 at 16:23
  • The above is to say -- you seem to be advocating trusting TVDB. I advocate trusting no-one, except to the extent that you actually have a reason to do so and make a deliberate decision that such trust is appropriate. In this case, reducing the level of trust in the data could be as easy as sanitization with `printf '%q=%q\n' "${a,,}" "$b"` (lowercase per http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html). – Charles Duffy Apr 10 '17 at 16:28
  • @CharlesDuffy, Thanks for the lowercase note, fixed. While your advocacy of defense-in-depth is both valid and interesting, there's more than one side to such matters; unfortunately a comment section would not be the best forum to contrast alternate schools of programming prophylaxis. – agc Apr 11 '17 at 21:51
  • @CharlesDuffy, After some experimenting with your `touch` example, it's unclear how it applies to this instance. Could you provide a specific value of a (post-`jq`) `read a b` input line that would, if this answer's current code were run and then `eval`'d *once*, execute (at `eval` time) `mplayer baz`? – agc Apr 12 '17 at 04:47