4

In practice, keys have to be unique within a JSON object (e.g. Does JSON syntax allow duplicate keys in an object?). However, suppose I have a file with the following contents:

{
    "a" : "1",
    "b" : "2",
    "a" : "3"
}

Is there a simple way of converting the repeated keys to an array? So that the file becomes:

{
    "a" : [ {"key": "1"}, {"key": "3"}],
    "b" : "2"
}

Or something similar, but which combines the repeated keys into an array (or finds and alternative way to extract the repeated key values).

Here's a solution in Java: Convert JSON object with duplicate keys to JSON array

Is there any way to do it with awk/bash/python?

peak
  • 105,803
  • 17
  • 152
  • 177
econ
  • 547
  • 7
  • 22
  • Where does this json come from? Do you have access to it from the server side, or as a string, before it is evaluated? Once a json becomes a JS object I don't think you can do anything (perhaps I am wrong) – Shovalt Apr 30 '16 at 15:27
  • I generate the json... in principle I could do `jq -c '.'` and that would output it as a one-line string. – econ Apr 30 '16 at 15:29
  • 1
    Possible duplicate of [Convert JSON object with duplicate keys to JSON array](http://stackoverflow.com/questions/24416960/convert-json-object-with-duplicate-keys-to-json-array) – Shovalt Apr 30 '16 at 15:34
  • See if these answer your question: http://stackoverflow.com/questions/24416960/convert-json-object-with-duplicate-keys-to-json-array ; http://stackoverflow.com/questions/17063257/necessity-for-duplicate-keys-in-json-object – Shovalt Apr 30 '16 at 15:35
  • @Shovalt: thanks for this link, I didn't see it. However, that answer is in `java`... – econ Apr 30 '16 at 15:53
  • If you generate the file yourself, why not doing it right? – hek2mgl Apr 30 '16 at 16:03
  • @hek2mgl: I'm still figuring out the code that generates the `json`, so I thought this could be used in the meantime. – econ Apr 30 '16 at 16:22
  • For starters, you'll need to stream the object in. If you had read it all in normally, you'll lose the duplicated keys (as it should). With the streamed in values, you could then build up results. But I don't know enough about how to work with the streaming protocols in jq to be able to build out a solution. Start there. – Jeff Mercado Apr 30 '16 at 16:29
  • You could do this with `--stream`, but it's way easier to output a meaningful JSON from your application instead. –  Apr 30 '16 at 19:41
  • Thanks! I'll try to do it properly... – econ Apr 30 '16 at 19:42

4 Answers4

6

If your input is really a flat JSON object with primitives as values, this should work:

jq -s --stream 'group_by(.[0]) | map({"key": .[0][0][0], "value": map(.[1])}) | from_entries'

{
  "a": [
    "1",
    "3"
  ],
  "b": [
    "2"
  ]
}

For more complex outputs, that would require actually understanding how --stream is supposed to be used, which is beyond me.

  • Thank you! This works for the example I posted, and almost works on my actual data... this approach groups the results by the key name (which makes perfect sense), but then in my data I have some odd cases where value is available for some keys, but not the others and the order matters... I may not be writing it correctly, but in any case, the code you provided will help me to solve the actual problem (until I manage to generate proper JSONs in the first instance). – econ May 01 '16 at 18:31
  • As I said in the comments to your initial post, it's probably way easier to generate simple JSONs than it is to do jq magic on complex JSONs. –  May 01 '16 at 20:13
4

Building on Santiago's answer using -s --stream, the following filter builds up the object one step at a time, thus preserving the order of the keys and of the values for a specific key:

reduce (.[] | select(length==2)) as $kv ({};
      $kv[0][0] as $k
      |$kv[1] as $v
      | (.[$k]|type) as $t
      | if $t == "null" then .[$k] = $v
        elif $t == "array" then .[$k] += [$v]
        else .[$k] = [ .[$k], $v ]
        end)

For the given input, the result is:

{
  "a": [
    "1",
    "3"
  ],
  "b": "2"
}

To illustrate that the ordering of values for each key is preserved, consider the following input:

{
    "c" : "C",
    "a" : "1",
    "b" : "2",
    "a" : "3",
    "b" : "1"
}

The output produced by the filter above is:

{
  "c": "C",
  "a": [
    "1",
    "3"
  ],
  "b": [
    "2",
    "1"
  ]
}
peak
  • 105,803
  • 17
  • 152
  • 177
1

Building up on peak's answer, the following filter also works on multi object-input, with nested objects and without the slurp-Option (-s).

This is not an answer to the initial question, but because the jq-FAQ links here it might be useful for some visitors

File jqmergekeys.txt

def consumestream($arr): # Reads stream elements from stdin until we have enough elements to build one object and returns them as array
input as $inp 
| if $inp|has(1) then consumestream($arr+[$inp]) # input=keyvalue pair => Add to array and consume more
  elif ($inp[0]|has(1)) then consumestream($arr) # input=closing subkey => Skip and consume more
  else $arr end; # input=closing root object => return array

def convert2obj($stream): # Converts an object in stream notation into an object, and merges the values of duplicate keys into arrays
reduce ($stream[]) as $kv ({}; # This function is based on http://stackoverflow.com/a/36974355/2606757
      $kv[0] as $k
      | $kv[1] as $v
      | (getpath($k)|type) as $t # type of existing value under the given key
      | if $t == "null" then setpath($k;$v) # value not existing => set value
        elif $t == "array" then setpath($k; getpath($k) + [$v] ) # value is already an array => add value to array
        else setpath($k; [getpath($k), $v ]) # single value => put existing and new value into an array
        end);

def mainloop(f):  (convert2obj(consumestream([input]))|f),mainloop(f); # Consumes streams forever, converts them into an object and applies the user provided filter
def mergeduplicates(f): try mainloop(f) catch if .=="break" then empty else error end; # Catches the "break" thrown by jq if there's no more input

#---------------- User code below --------------------------    

mergeduplicates(.) # merge duplicate keys in input, without any additional filters

#mergeduplicates(select(.layers)|.layers.frame) # merge duplicate keys in input and apply some filter afterwards

Example:

tshark -T ek | jq -nc --stream -f ./jqmergekeys.txt
Timo
  • 1,724
  • 14
  • 36
1

Here's a simple alternative that generalizes well:

reshape.jq

def augmentpath($path; $value):
  getpath($path) as $v
  | setpath($path; $v + [$value]);

reduce (inputs | select(length==2)) as $pv
  ({}; augmentpath($pv[0]; $pv[1]) )

Usage

jq -n -f reshape.jq input.json

Output

With the given input:

{
  "a": [
    "1",
    "3"
  ],
  "b": [
    "2"
  ]
}

Postscript

If it's important to avoid arrays of singletons, either the def of augmentpath could be modified, or a postprocessing step could be added.

peak
  • 105,803
  • 17
  • 152
  • 177