6

Lets say we have this file:

{
  "persons": [
    {
      "friends": 4,
      "phoneNumber": 123456,
      "personID": 11111
    },
    {
      "friends": 2057,
      "phoneNumber": 432100,
      "personID": 22222
    },
    {
      "friends": 50,
      "phoneNumber": 147258,
      "personID": 55555
    }
  ]
}

I now want to extract the phone numbers of the persons 11111, 22222, 33333, 44444 and 55555 as a semicolon-separated string:

123456;432100;;;147258

While running

cat persons.txt | jq ".persons[] | select(.personID==<ID>) | .phoneNumber"

once for each <ID> and glueing the results together with the ; afterwards works, this is terribly slow, because it has to reload the file for each of the IDs (and other fields I want to extract).

Concatenating it in a single query:

 cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (.persons[] | select(.personID==22222) | .phoneNumber), (.persons[] | select(.personID==33333) | .phoneNumber), (.persons[] | select(.personID==44444) | .phoneNumber), (.persons[] | select(.personID==55555) | .phoneNumber)"

This also works, but it gives

123456
432100
147258

so I do not know which of the fields are missing and how many ; I have to insert.

peak
  • 105,803
  • 17
  • 152
  • 177
Bowi
  • 1,378
  • 19
  • 33
  • Do you want a single semicolon between the numbers? I see in your question you wrote ´123456;432100;;;147258´ why does this last number have 3 semicolons before it? – MikeKatz45 Nov 05 '18 at 16:52
  • Yes, because I look for `11111`, `22222`, `33333`, `44444`, and `55555`, but `33333` and `44444` do not show up in the file and thus are null-ish. If I had only a single `;`, I could not guarantee that the correct number is in the correct column. – Bowi Nov 05 '18 at 16:56
  • This is disgusting and not quite what you're asking, but it might inspire some : `. as $input | [11111, 22222, 33333, 44444, 55555][] | (. as $id | $input.persons | map(select(.personID == $id)))` ([jqplay](https://jqplay.org/s/qxTDiMmlYY)) – Aaron Nov 05 '18 at 17:05
  • Now this is what you're asking, but even more disgusting : `[. as $input | [11111, 22222, 33333, 44444, 55555][] | . as $id | $input.persons | map(select(.personID == $id)) | map(.phoneNumber) | join("")] | join(";")` ([jqplay](https://jqplay.org/s/O4lnH-_tLA)). You should seriously avoid using that, wait for someone that knows `jq` better than I do and will cleanup this mess. – Aaron Nov 05 '18 at 17:18
  • does the solution have to use jq? – MikeKatz45 Nov 05 '18 at 17:54
  • @MikeKatz45, no, not at all. – Bowi Nov 06 '18 at 08:09

3 Answers3

3

With your sample input in input.json, and using jq 1.6 (or a jq with INDEX/2), the following invocation of jq produces the desired output:

jq -r --argjson ids '[11111, 22222, 33333, 44444, 55555]' -f tossv.jq input.json 

assuming tossv.jq contains the program:

INDEX(.persons[]; .personID) as $dict
| $ids
| map( $dict[tostring] | .phoneNumber)
| join(";")

Program notes

  1. INDEX/2 produces a JSON object that serves as a dictionary. Since JSON keys must be strings, tostring must be used in line 3 above.

  2. When using join(";"), null values effectively become empty strings.

  3. If your jq does not have INDEX/2, then now might be a good time to upgrade. Otherwise you can snarf its definition by googling: jq "def INDEX" builtin.jq

peak
  • 105,803
  • 17
  • 152
  • 177
  • If that worked it would be very elegant! Unfortunately I have no chance of upgrading my jq 1.5 to 1.6... – Bowi Nov 06 '18 at 08:15
  • @Bowi - So why don’t you copy-and-paste the def? – peak Nov 06 '18 at 15:43
  • You mean `def index($i): indices($i) | .[0];` from https://github.com/stedolan/jq/blob/master/src/builtin.jq ? – Bowi Nov 06 '18 at 15:50
  • @Bowi - No, of course not. For one thing, jq names are case-sensitive. For another, INDEX/2 has arity 2. – peak Nov 06 '18 at 15:55
0

Unfortunately I couldn't test if peak's answer works since I only have jq 1.5. Here's what I came up with yesterday evening:

  • For each semicolon, add the following query

    (\";\" as \$a | \$a)
    
  • Resulting command (abstract):

     cat persons.txt | jq "(<1's phone number>), (\";\" as \$a | \$a), 
     (<2's phone number>), (\";\" as \$a | \$a), ..."
    
  • Resulting command (concrete):

    cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (\";\" as \$a | \$a), 
    (.persons[] | select(.personID==22222) | .phoneNumber), (\";\" as \$a | \$a), 
    (.persons[] | select(.personID==33333) | .phoneNumber), (\";\" as \$a | \$a), 
    (.persons[] | select(.personID==44444) | .phoneNumber), (\";\" as \$a | \$a), 
    (.persons[] | select(.personID==55555) | .phoneNumber)"
    
  • Result:

    123456
    ";"
    432100
    ";"
    ";"
    ";"
    147258
    
  • Delete the newlines and ":

    <commandAsAbove> | tr --delete "\n\""
    
  • Result:

    123456;432100;;;147258
    

Do not get me wrong, this is far uglier than peak's answer, but it worked for me yesterday.

Bowi
  • 1,378
  • 19
  • 33
  • 1
    The last two parts can easily be done by `jq` : to delete the newlines, first put your results in an array then join them with the empty string (`[ command ] | join("")`). This would leave you with a `"1;2;;;3"` result, in which you can remove the quotes by using the `-r` "raw-output" flag. See [this jqplay](https://jqplay.org/s/1Q-v03cJ_t). – Aaron Nov 06 '18 at 09:33
  • Oh and bash-wise : 1) make your life easier by enclosing your jq command in single quotes so you don't have to escape the double-quotes and dollars inside and 2) prefer `jq command file` to `cat file | jq command` – Aaron Nov 06 '18 at 09:36
  • Also, `E as $a | $a` can here be simplified to just `E` in each case. – peak Nov 06 '18 at 16:00
0

Without jq solution:

for i in $(seq 11111 11111 55555)
do
  string=$(grep -B1 "$i" persons.txt | head -1 | sed 's/.* \(.*\),/\1/g')
  echo "$string;" >> output
done
cat output | tr -d '\n' | rev | cut -d';' -f2- | rev > tmp && mv tmp output

This little script will yield the result you want and you can adapt it quickly if the input data varies

cat output
123456;432100;;;147258
MikeKatz45
  • 545
  • 5
  • 16
  • 1
    Do I see this right -- this relies heavily on the JSON structure being the same every time, correct? Thus, when an entry gets mixed (eg. friends after phone number), it stops working? – Bowi Nov 06 '18 at 15:36
  • yep this makes the assumption that each JSON "person" will have friends, phone and ID in that order of lines. – MikeKatz45 Nov 06 '18 at 15:43