0

another question about this topic.

There are several questions and answer in this community about it:

  1. how to convert json to csv in python
  2. Convert Json to CSV using Python
  3. Python convert JSON to CSV
  4. Convert JSON to CSV with Python 3
  5. It follows...

But I think in this case, it doesn't work for this hard json:

MY JSON:

    {
   "took":32,
   "timed_out":false,
   "_shards":{
      "total":4,
      "successful":4,
      "skipped":0,
      "failed":0
   },
   "hits":{
      "total":{
         "value":94,
         "relation":"eq"
      },
      "max_score":"None",
      "hits":[

      ]
   },
   "aggregations":{
      "hostname":{
         "doc_count_error_upper_bound":0,
         "sum_other_doc_count":0,
         "buckets":[
            {
               "key":"396",
               "doc_count":47,
               "process_name":{
                  "doc_count_error_upper_bound":0,
                  "sum_other_doc_count":16,
                  "buckets":[
                     {
                        "key":"accounts-daemon",
                        "doc_count":2,
                        "process_state":{
                           "doc_count_error_upper_bound":0,
                           "sum_other_doc_count":0,
                           "buckets":[
                              {
                                 "key":"sleeping",
                                 "doc_count":2,
                                 "process_pid":{
                                    "doc_count_error_upper_bound":0,
                                    "sum_other_doc_count":0,
                                    "buckets":[
                                       {
                                          "key":1092,
                                          "doc_count":2,
                                          "process_ppid":{
                                             "doc_count_error_upper_bound":0,
                                             "sum_other_doc_count":0,
                                             "buckets":[
                                                {
                                                   "key":1,
                                                   "doc_count":2,
                                                   "process_pgid":{
                                                      "doc_count_error_upper_bound":0,
                                                      "sum_other_doc_count":0,
                                                      "buckets":[
                                                         {
                                                            "key":1092,
                                                            "doc_count":2,
                                                            "process_cmdline":{
                                                               "doc_count_error_upper_bound":0,
                                                               "sum_other_doc_count":0,
                                                               "buckets":[
                                                                  {
                                                                     "key":"/usr/lib/accountsservice/accounts-daemon",
                                                                     "doc_count":2,
                                                                     "process_username":{
                                                                        "doc_count_error_upper_bound":0,
                                                                        "sum_other_doc_count":0,
                                                                        "buckets":[
                                                                           {
                                                                              "key":"root",
                                                                              "doc_count":2,
                                                                              "process_cwd":{
                                                                                 "doc_count_error_upper_bound":0,
                                                                                 "sum_other_doc_count":0,
                                                                                 "buckets":[
                                                                                    {
                                                                                       "key":"/",
                                                                                       "doc_count":2,
                                                                                       "process_cpu_start_time":{
                                                                                          "doc_count_error_upper_bound":0,
                                                                                          "sum_other_doc_count":0,
                                                                                          "buckets":[
                                                                                             {
                                                                                                "key":1570456346000,
                                                                                                "key_as_string":"2019-10-07T13:52:26.000Z",
                                                                                                "doc_count":2,
                                                                                                "process_fd_limit_hard":{
                                                                                                   "value":4096.0
                                                                                                },
                                                                                                "process_fd_open":{
                                                                                                   "value":8.0
                                                                                                },
                                                                                                "process_memory_size":{
                                                                                                   "value":281055232.0
                                                                                                },
                                                                                                "process_memory_rss_bytes":{
                                                                                                   "value":6168576.0
                                                                                                },
                                                                                                "process_memory_share":{
                                                                                                   "value":5464064.0
                                                                                                },
                                                                                                "process_cpu_total_pct":{
                                                                                                   "value":0.0005
                                                                                                },
                                                                                                "process_fd_limit_soft":{
                                                                                                   "value":1024.0
                                                                                                },
                                                                                                "process_cpu_system_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_user_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_total_norm_pct":{
                                                                                                   "value":0.0005
                                                                                                },
                                                                                                "process_cpu_total_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_total_value":{
                                                                                                   "value":18775.0
                                                                                                }
                                                                                             }
                                                                                          ]
                                                                                       }
                                                                                    }
                                                                                 ]
                                                                              }
                                                                           }
                                                                        ]
                                                                     }
                                                                  }
                                                               ]
                                                            }
                                                         }
                                                      ]
                                                   }
                                                }
                                             ]
                                          }
                                       }
                                    ]
                                 }
                              }
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "key":"961",
               "doc_count":47,
               "process_name":{
                  "doc_count_error_upper_bound":0,
                  "sum_other_doc_count":16,
                  "buckets":[
                     {
                        "key":"accounts-daemon",
                        "doc_count":2,
                        "process_state":{
                           "doc_count_error_upper_bound":0,
                           "sum_other_doc_count":0,
                           "buckets":[
                              {
                                 "key":"sleeping",
                                 "doc_count":2,
                                 "process_pid":{
                                    "doc_count_error_upper_bound":0,
                                    "sum_other_doc_count":0,
                                    "buckets":[
                                       {
                                          "key":1070,
                                          "doc_count":2,
                                          "process_ppid":{
                                             "doc_count_error_upper_bound":0,
                                             "sum_other_doc_count":0,
                                             "buckets":[
                                                {
                                                   "key":1,
                                                   "doc_count":2,
                                                   "process_pgid":{
                                                      "doc_count_error_upper_bound":0,
                                                      "sum_other_doc_count":0,
                                                      "buckets":[
                                                         {
                                                            "key":1070,
                                                            "doc_count":2,
                                                            "process_cmdline":{
                                                               "doc_count_error_upper_bound":0,
                                                               "sum_other_doc_count":0,
                                                               "buckets":[
                                                                  {
                                                                     "key":"/usr/lib/accountsservice/accounts-daemon",
                                                                     "doc_count":2,
                                                                     "process_username":{
                                                                        "doc_count_error_upper_bound":0,
                                                                        "sum_other_doc_count":0,
                                                                        "buckets":[
                                                                           {
                                                                              "key":"root",
                                                                              "doc_count":2,
                                                                              "process_cwd":{
                                                                                 "doc_count_error_upper_bound":0,
                                                                                 "sum_other_doc_count":0,
                                                                                 "buckets":[
                                                                                    {
                                                                                       "key":"/",
                                                                                       "doc_count":2,
                                                                                       "process_cpu_start_time":{
                                                                                          "doc_count_error_upper_bound":0,
                                                                                          "sum_other_doc_count":0,
                                                                                          "buckets":[
                                                                                             {
                                                                                                "key":1570450883000,
                                                                                                "key_as_string":"2019-10-07T12:21:23.000Z",
                                                                                                "doc_count":2,
                                                                                                "process_fd_limit_hard":{
                                                                                                   "value":4096.0
                                                                                                },
                                                                                                "process_fd_open":{
                                                                                                   "value":8.0
                                                                                                },
                                                                                                "process_memory_size":{
                                                                                                   "value":281153536.0
                                                                                                },
                                                                                                "process_memory_rss_bytes":{
                                                                                                   "value":5992448.0
                                                                                                },
                                                                                                "process_memory_share":{
                                                                                                   "value":5279744.0
                                                                                                },
                                                                                                "process_cpu_total_pct":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_fd_limit_soft":{
                                                                                                   "value":1024.0
                                                                                                },
                                                                                                "process_cpu_system_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_user_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_total_norm_pct":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_total_ticks":{
                                                                                                   "value":0.0
                                                                                                },
                                                                                                "process_cpu_total_value":{
                                                                                                   "value":13410.0
                                                                                                }
                                                                                             }
                                                                                          ]
                                                                                       }
                                                                                    }
                                                                                 ]
                                                                              }
                                                                           }
                                                                        ]
                                                                     }
                                                                  }
                                                               ]
                                                            }
                                                         }
                                                      ]
                                                   }
                                                }
                                             ]
                                          }
                                       }
                                    ]
                                 }
                              }
                           ]
                        }
                     }
                  ]
               }
            }
         ]
      }
   }
}

I have tried almost everything and I reached out to parse it with a horrible multiloop code, but I think there is an easier way to do this, but no post in this community has been aple to help me.

Could you help me to obtain something like this?:

My result (incomplete):

   hostname     process_name process_state  ...  process_username  process_cwd    process_cpu_start_time
0       396  accounts-daemon      sleeping  ...              root            /  2019-10-07T13:52:26.000Z
1       396           iscsid      sleeping  ...              root            /  2019-10-07T13:52:27.000Z
2       396           iscsid      sleeping  ...              root            /  2019-10-07T13:52:28.000Z
3       396      ksoftirqd/0      sleeping  ...              root            /  2019-10-07T13:52:28.000Z
4       396      kworker/0:1      sleeping  ...              root            /  2019-10-07T13:52:28.000Z
..      ...              ...           ...  ...               ...          ...                       ...
25      961          polkitd      sleeping  ...              root            /  2019-10-07T12:21:31.000Z
26      961        rcu_sched      sleeping  ...              root            /  2019-10-07T12:21:31.000Z
27      961          systemd      sleeping  ...              root            /  2019-10-07T12:20:24.000Z
28      961  systemd-journal      sleeping  ...              root            /  2019-10-07T12:20:38.000Z
29      961  unattended-upgr      sleeping  ...              root            /  2019-10-07T12:21:26.000Z

I know it is not easy, so I'll appreciate any help.

Thanks in advance.

Henry Navarro
  • 943
  • 8
  • 34

1 Answers1

0

The idea

The main idea is to navigate down the structure of your JSON object. To make the program readable, I came up with 2 functions, which perform this navigation and append the keys along the descension path to a list (the content of the current row).

Functions to navigate down the JSON object

Function 1:

def dig(dct, path):
    ret = dct
    for step in path.split('.'):
        if step not in ret:
            print(f'**** No {step} element.')
            return None
        ret = ret[step]
        if isinstance(ret, list):
            ret = ret[0]
    return ret

This function "digs into" dct - a dictionary (a JSON object), along the path (dot separated string). An additional detail is that if the target sub-object (to which the function descended) is a list, then there is additional "go down" step, namely the function descends to the first element of this list. The return value is the "final" sub-object, after all descends.

Function 2:

def dig2(dct, step, res, key='key'):
    dct = dig(dct, step + '.buckets')
    res.append(dct[key])
    return(dct)

Parameters:

  • dct - the source dictionary,
  • step - the first step of the path to descend,
  • res - the result list, to which append the key after descend,
  • key - the name of the key to read from the target sub-object.

This function speeds up the navigation down the source object. You specify only the first step and this function descends one step deeper (to buckets). Then res is appended with the key found at this level.

The return value is again the "final" sub-object, to be used to perform further descends.

Read the source JSON file

To read your JSON string, I saved it in a file and read as follows:

with open('input.json') as json_file:
    data = json.load(json_file)

List of steps

The list of steps, to navigate down the source object is as follows:

steps = ['hostname', 'process_name', 'process_state', 'process_pid',
    'process_ppid', 'process_pgid', 'process_cmdline',
    'process_username', 'process_cwd', 'process_cpu_start_time']

This list is also the list of column names in the result DataFrame.

Main program

tbl = []
for buck in dig(data, 'aggregations.hostname')['buckets']:
    res = [buck['key']]
    for step in steps[1:-1]:
        buck = dig2(buck, step, res)
    dig2(buck, steps[-1], res, 'key_as_string')
    tbl.append(res)
result = pd.DataFrame(tbl, columns=steps)

First we create tbl - an empty list, where data for subsequent rows will be collected.

Then, for buck in ... operates on each bucket within aggregations.hostname.

The first element of the result for the current row is key from the current bucket.

After that, for step in steps[1:-1] digs into following steps, excluding the last, and adds each key to the result list.

The last step of descending is outside this loop, because this time the key to append is key_as_string, and finally res (data for the current row) is added to tbl.

After the loop, the result DataFrame is created.

As the source JSON object included in your post contains only 2 buckets, this DataFrame contains only 2 rows.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41