1

I am interested in flattening JSON with multiple layers of nested arrays of object. I would ideally like to do this in Java but it seems like the Pandas library in python might be good for this.

Does anyone know a good java library for this?

I found this article (Create a Pandas DataFrame from deeply nested JSON) using pandas and jq and my solution almost works but the output I am receiving is not quite as expected. Here is my code sample

json_data = '''{ "id": 1,
"things": [
    {
        "tId": 1,
        "objs": [{"this": 99},{"this": 100}]
    },
    {
        "tId": 2,
        "objs": [{"this": 222},{"this": 22222}]
    }
]
 }'''

rule = """[{id: .id, 
        tid: .things[].tId,
        this: .things[].objs[].this}]"""
out = jq(rule, _in=json_data).stdout
res = pd.DataFrame(json.loads(out))

The problem is the output I am receiving is this:

   id   this  tid
0   1     99    1
1   1    100    1
2   1    222    1
3   1  22222    1
4   1     99    2
5   1    100    2
6   1    222    2
7   1  22222    2

I am expecting to see

   id   this  tid
0   1     99    1
1   1    100    1
3   1    222    2
4   1  22222    2

Any tips on how to make this work, different solutions, or a java option would be great!

Thanks in advance!

Craig

peak
  • 105,803
  • 17
  • 152
  • 177
craigtb
  • 647
  • 5
  • 12
  • 30
  • If you are just going to call `json.loads` on the result, I don't know if using `jq` is any more efficient than decoding `json_data` itself and working with the resulting `dict` directly in Python. – chepner Jan 29 '19 at 19:53

1 Answers1

1

The problem is that your "rule" creates a Cartesian product, whereas in effect you want nested iteration.

With your input, the following jq expression, which makes the nested iteration reasonably clear, produces the output as shown:

.id as $id
| .things[] as $thing
| $thing.objs[]
| [$id, .this, $thing.tId]
| @tsv

Output

1   99  1
1   100 1
1   222 2
1   22222   2

Rule

So presumably your rule should look something like this:

[{id} + (.things[] | {tid: .tId} + (.objs[] | {this}))]

or if you want to make the nested iteration clearer:

[ .id as $id
  | .things[] as $thing
  | $thing.objs[]
  | {id: $id, this, tid: $thing.tId} ]

Running jq in java

Besides processBuilder, you might like to take a look at these wrappers:

peak
  • 105,803
  • 17
  • 152
  • 177
  • this works for me! thanks! do you know a way to run jq in java? – craigtb Jan 29 '19 at 20:30
  • 1
    @craigtb, same way you run any other external program; `ProcessBuilder("jq", jqScriptHere).redirectInput(placeWithInputJson).redirectOutput(placeYouWantYourOutput).start()`, or such. – Charles Duffy Jan 29 '19 at 20:34