0

I am doing aggregation using ES, and the result come as follow:

{'took': 27,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 233, 'relation': 'eq'},
  'max_score': None,
  'hits': []},
 'aggregations': {'sales_over_time': {'buckets': [{'key': 1617235200000,
     'doc_count': 9,
     'name': {'doc_count_error_upper_bound': 0,
      'sum_other_doc_count': 0,
      'buckets': [{'key': '624232499',
        'doc_count': 4,
        'latest_comment': {'hits': {'total': {'value': 4, 'relation': 'eq'},
          'max_score': None,
          'hits': [{'_index': 'data',
            '_type': 'test',
            '_id': 'Hb5Uj3gBm2iwycZfdDvr',
            '_score': None,
            '_source': {'totalsales': 2630149, 'Id': '624232499'},
            'sort': [1617312374760]}]}}},
       {'key': '624232532',
        'doc_count': 4,
        'latest_comment': {'hits': {'total': {'value': 4, 'relation': 'eq'},
          'max_score': None,
          'hits': [{'_index': 'data',
            '_type': 'test',
            '_id': 'q77NjngBm2iwycZf6hdU',
            '_score': None,
            '_source': {'sales': 5810, 'Id': '624232532'},
            'sort': [1617303556611]}]}}},
       {'key': '656625970',
        'doc_count': 1,
        'latest_comment': {'hits': {'total': {'value': 1, 'relation': 'eq'},
          'max_score': None,
          'hits': [{'_index': 'data',
            '_type': 'test',
            '_id': 'Nb4xj3gBm2iwycZfFjKH',
            '_score': None,
            '_source': {'totalsales': 12690, 'Id': '656625970'},
            'sort': [1617310056788]}]}}}]}},

I have try to get the result and using pd.normalze like test_json= pd.json_normalize(result['aggregations']['sales_over_time']['buckets']) and it comes out

key doc_count   name.doc_count_error_upper_bound    name.sum_other_doc_count    name.buckets
0   1617235200000   9   0   0   [{'key': '624232499', 'doc_count': 4, 'latest_...
1   1617321600000   9   0   0   [{'key': '624232499', 'doc_count': 4, 'latest_.

So i try to Flattern the NESTED json using

pd.json_normalize(result['aggregations']['sales_over_time']['buckets']).explode("name.buckets").to_dict(orient="records"))

And it news have ONE NEST LAYER IN IT of"name.buckets.latest_comment.hits.hits"

ey  doc_count   name.doc_count_error_upper_bound    name.sum_other_doc_count    name.buckets.key    name.buckets.doc_count  name.buckets.latest_comment.hits.total.value    name.buckets.latest_comment.hits.total.relation name.buckets.latest_comment.hits.max_score  name.buckets.latest_comment.hits.hits
0   1617235200000   9   0   0   624232499   4   4   eq  None    [{'_index': 'data', '_type': 'test', '_...
1   1617235200000   9   0   0   624232532   4   4   eq  None    [{'_index': 'data', '_type': 'test', '_...

How can i flattern all the nested json ?

Man Man Yu
  • 161
  • 3
  • 13

1 Answers1

1
  • js is your sample JSON
  • four passes through json_normalize() does it
  • explode() embedded list, reset_index() if necessary
pd.json_normalize(pd.json_normalize(pd.json_normalize(pd.json_normalize(js)
                   .explode("aggregations.sales_over_time.buckets")
                   .to_dict(orient="records"))
 .explode("aggregations.sales_over_time.buckets.name.buckets")
 .reset_index(drop=True)
 .to_dict(orient="records"))
 .explode("aggregations.sales_over_time.buckets.name.buckets.latest_comment.hits.hits")
 .to_dict(orient="records")
)

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • thanks you . Need sometime to digest the answer ! thanks in advance ! – Man Man Yu Apr 27 '21 at 07:01
  • it's a simple concept... `json_normalize()` will only expand one **list**. Hence manage **lists**s yourself by using `explode()` and then using `json_normalize()` again to expand embedded **dict**s – Rob Raymond Apr 27 '21 at 07:26