3

I have the following json file

[
  {
    "clusterName": "cluster1",
    "nodes": [
      {
        "hostname": "server1",
        "dse": "6.7.5"
      },
      {
        "hostname": "server2",
        "dse": "6.7.5"
      }
    ]
  },
  {
    "clusterName": "cluster2",
    "nodes": [
      {
        "hostname": "server3",
        "dse": "6.7.5"
      },
      {
        "hostname": "server4",
        "dse": "6.7.5"
      }
    ]
  }
]

And I have another json

[
  {
    "hostname": "server1",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server2",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server3",
    "memorysize": "47.01 GiB",
    "processorcount": 10
  },
  {
    "hostname": "server4",
    "memorysize": "47.01 GiB",
    "processorcount": 11
  },
  {
    "hostname": "server5",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  },
  {
    "hostname": "server6",
    "memorysize": "47.01 GiB",
    "processorcount": 12
  }
]

I want to join these two jsons to produce the following output

[
  {
    "clusterName": "cluster1",
    "nodes": [
      {
        "hostname": "server1",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 12
      },
      {
        "hostname": "server2",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 12
      }
    ]
  },
  {
    "clusterName": "cluster2",
    "nodes": [
      {
        "hostname": "server3",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 10
      },
      {
        "hostname": "server4",
        "dse": "6.7.5",
        "memorysize": "47.01 GiB",
        "processorcount": 11
      }
    ]
  }
]

Basically the first file has the list of clusters dictionary. with nodes and I have a second file with the list of nodes dictionary.

The solution mentioned didn't work with multiple clusters.

Is there a better to do this in python instead?

developthou
  • 343
  • 1
  • 10
  • Can we assume that each node of the first file will have an associated object in the second file? – Aaron Feb 05 '20 at 16:58
  • @Aaron each node of the first file will have an associated object in the second file. – developthou Feb 05 '20 at 17:09
  • My jq foo is weak, so I'm not sure how best to merge this, but you can get the desired node array with `cat file1.txt file2.txt | jq -s ".[0][].nodes[] * .[1][0]"` – William Pursell Feb 05 '20 at 17:38
  • I believe you have a typo in the third code snippet of your question - your desired output. The outer array has two elements of which the second has the key `"clusterName"` with the value `"cluster2"`. This second element's `"nodes"` array in turn has two elements of which the first has `"hostname"` set to `"server3"`. But the second element ***should*** have its `"hostname"` set to `"server4"`. Right? - I suggest you consider correcting this to avoid confusing any readers coming here. (I cannot edit this myself because of the "6 character rule".) – Henke Mar 07 '21 at 10:24
  • Thank you for catching the typo. I have updated it. – developthou Mar 09 '21 at 14:29

3 Answers3

2

A solution using jq:

<file1 jq --slurpfile f file2 '
  {
     clusterName:.[].clusterName,
     nodes:map($f[],.nodes)|add|group_by(.hostname)|map(add)
  }'

This build an object using both files.
The first field clusterName is taken from same field of the second file.
The second field nodes is the combination of objects of both based on hostname (done with group_by command)


A tentative answer to the below comment:
I don't think that -s has any advantages here since you need both files in memory (instead of 1 with --slurpfile).
In order not to play with indexes, the idea is to test if the field exists or not before using it. You can do this with ? and // operator. Together they from a sort of if not ... then .... Here is a possible solution:

jq -s '{
   clusterName:(.[][].clusterName?//empty),
   nodes:map(.[].nodes[]?//.[])|group_by(.hostname)|map(add)
}' file1 file2

As you can see, the difficulty in both scripts is to "normalize" the objects in order to perform the group_by operation.

oliv
  • 12,690
  • 25
  • 45
  • Rather than writing to files, I would love to see a cleaner implementation of: `jq --slurp '{ clusterName: .[0][].clusterName, nodes: [[.[1][]], .[0][0].nodes] | add | group_by(.hostname) | map(add) }' file1.txt file2.txt` It's tempting to ask a new question! How would you implement this solution using --slurp rather than --slurpfile? – William Pursell Feb 05 '20 at 18:23
  • Excellent! Thanks. For me, the biggest advantage of `--slurp` is that I often use `jq` as a filter in which neither object is in a file and `jq` is just reading from the stream. – William Pursell Feb 05 '20 at 18:55
  • That is, instead of `jq -s ... file1 file2`, the invocation looks more like `cat file1 file2 | jq -s ...` (where `cat` is replaced by some non-trivial json object generator) – William Pursell Feb 05 '20 at 18:56
  • In the same conditions, I often use `json_producer1 | jq --slurpfile f <(json_producer2) '...'` – oliv Feb 05 '20 at 19:01
  • I have updated the problem description as the suggestions failed with multiple clusters. – developthou Feb 06 '20 at 00:23
0

I accomplished this using python instead

for cluster in clusters:
    for node in cluster["nodes"]:
        node.update(list(filter(lambda nodes: nodes['hostname'] == node['hostname'],nodes))[0])
developthou
  • 343
  • 1
  • 10
0

the invocation looks more like cat file1 file2 | jq ...

Here's a solution that assumes all the inputs are presented as a stream. This solution also avoids using the -s command-line option.

cat master.json hostnames.json | jq '
  # input: an array of objects, each with a "nodes" key
  def mergeNode($node): 
    map(if .hostname == $node.hostname then . + $node else . end);
  reduce inputs[] as $n (.; map_values( .nodes |= mergeNode($n) ))'

Notice that the -n command-line option has NOT been specified.

This solution also allows more than one "hostnames" file.

peak
  • 105,803
  • 17
  • 152
  • 177