-2

I have a JSON output as below, how can i fetch clientHostname,clientId,completionTime,creationTime,status,retentionTime,fileCount value from it in a python code in a formatted table format.

[{'attributes': [{'key': '*anchor save set time', 'values': ['158']}, {'key': '*backup start time', 'values': ['158790']}, {'key': '*policy action name', 'values': ['Server db backup: 158790']}, {'key': '*policy name', 'values': ['Server Protection: 158860']}, {'key': '*policy protection period', 'values': ['0: 1580']}, {'key': '*policy workflow name', 'values': ['Server backup: 15860']}, {'key': '*ss clone retention', 'values': [' 158760: 150: 186']}, {'key': 'group', 'values': ['Server Protection']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T10:01:00+08:00', 'clientHostname': 'server.com', 'clientId': '52d79c1', 'completionTime': '2020-04-27T10:01:00+08:00', 'creationTime': '2020-04-21T10:01:00+08:00', 'fileCount': 2, 'id': '2b2d925f9c1', 'instances': [{'clone': False, 'id': '1587952860', 'status': 'Recoverable', 'volumeIds': ['444']}], 'level': '1', 'links': [{'href': 'https://server.com/backups/2b4c1', 'rel': 'item'}], 'name': 'index:b9c9dc1', 'retentionTime': '2020-05-18T23:59:59+08:00', 'saveTime': '2020-04-27T10:01:00+08:00', 'shortId': '1151745244', 'size': {'unit': 'Byte', 'value': 408}, 'type': 'File'}, {'attributes': [{'key': '*anchor save set time', 'values': ['1585706503']}, {'key': '*backup start time', 'values': ['1587952855']}, {'key': '*policy action name', 'values': ['Server db backup: 15855']}, {'key': '*policy name', 'values': ['Server Protection: 158795']}, {'key': '*policy protection period', 'values': ['0: 1585']}, {'key': '*policy workflow name', 'values': ['Server backup: 1587855']}, {'key': '*ss clone retention', 'values': [' 158755: 1587952855: 186']}, {'key': 'group', 'values': ['Server Protection']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-18T10:00:55+08:00', 'clientHostname': 'server.com', 'clientId': '52d7ae9c1', 'completionTime': '2020-04-27T10:00:55+08:00', 'creationTime': '2020-04-27T10:00:55+08:00', 'fileCount': 2, 'id': '0b8ae9c1', 'instances': [{'clone': False, 'id': '1', 'status': 'Recoverable', 'volumeIds': ['446929344']}], 'level': '1', 'links': [{'href': 'https://server.com/backups/0b8a', 'rel': 'item'}], 'name': 'index:ec3df9c1', 'retentionTime': '2020-05-18T23:59:59+08:00', 'saveTime': '2020-04-27T10:00:55+08:00', 'shortId': '1168522455', 'size': {'unit': 'Byte', 'value': 408}, 'type': 'File'}]

  • This question may have been answered here: https://stackoverflow.com/questions/36862308/how-to-create-a-table-with-data-from-json-output-in-python/36862989 – Charles Carriere Apr 28 '20 at 03:45
  • I tried below but getting error, could you please suggest. `data1 = data['attributes']['clientHostname'] print (data1)' getting below error- ' data1 = data['attributes']['clientHostname'] TypeError: list indices must be integers or slices, not str' – Nisha Budakoti Apr 28 '20 at 03:50
  • The value of the 'attributes' key is a list of dicts ( note the open square bracket '[' after 'attributes': ) - so you need to reference each item in the list by its position, not a key reference. try `data1 = data['attributes'][8]['clientHostname']` (not sure I have counted right, but try playing with that index to get what you are looking for) – Charles Carriere Apr 28 '20 at 16:25

2 Answers2

0

You could do it like this (Just load your json data in a dataframe first):

`import pandas as pd`
`df = pd.DataFrame.from_records(x)`
`df[['clientHostname', 'clientId', 'completionTime', 'creationTime', 'retentionTime', 'fileCount']]`

Here x is your json data that you have provided.

PRIN
  • 344
  • 1
  • 7
  • You are missing `status` that's the main thing. Rest are easy to extract – Pygirl Apr 28 '20 at 03:54
  • It's giving the below output which is not correct- ` attributes ... type 0 [{'key': '*anchor save set time', 'values': ['... ... File 1 [{'key': '*anchor save set time', 'values': ['... ... File 2 [{'key': '*anchor save set time', 'values': ['... ... File 3 [{'key': '*anchor save set time', 'values': ['... ... File [4 rows x 17 columns]` – Nisha Budakoti Apr 28 '20 at 04:00
  • Hi @NishaBudakoti, for your 'status' part, you could try something like this: df['status'] = df.apply(lambda row: row.instances[0]['status'], axis=1) – PRIN Apr 28 '20 at 04:12
  • thanks @Priya...Status part is coming correctly but above code is not giving the correct output for clientHostname,clientId,completionTime,creationTime,status,retentionTime,fileCount, could you please suggest. – Nisha Budakoti Apr 28 '20 at 04:22
  • Hi @NishaBudakoti, that code is working fine too, not sure what do you want to achieve. I am getting the following output: clientHostname clientId completionTime creationTime retentionTime fileCount 0 server.com 52d79c1 2020-04-27T10:01:00+08:00 2020-04-21T10:01:00+08:00 2020-05-18T23:59:59+08:00 2 1 server.com 52d7ae9c1 2020-04-27T10:00:55+08:00 2020-04-27T10:00:55+08:00 2020-05-18T23:59:59+08:00 2 Not sure how to add a screenshot here! – PRIN Apr 28 '20 at 04:29
  • I am running it with Python version-Python 3.7.4 but getting the same output which i have pasted above. – Nisha Budakoti Apr 28 '20 at 04:49
  • Try restarting your kernel/notebook/wherever you are executing the code. Because I don't see any issue. Just select only columns of interest. – PRIN Apr 28 '20 at 05:08
  • Already done. my code- `import pandas as pd x= df = pd.DataFrame.from_records(x) df[['clientHostname', 'clientId', 'completionTime', 'creationTime', 'retentionTime', 'fileCount']] print(df)` output- ` attributes ... type 0 [{'key': '*anchor save set time', 'values': ['... ... File 1 [{'key': '*anchor save set time', 'values': ['... ... File [2 rows x 17 columns]` – Nisha Budakoti Apr 28 '20 at 05:24
  • I got it, you are printing df. Assign this line to some new dataframe. For ex : df_new = df[['clientHostname', 'clientId', 'completionTime', 'creationTime', 'retentionTime', 'fileCount']] now print(df_new) – PRIN Apr 28 '20 at 05:45
  • Thanks @Priya...actually the issue was with the maxcolumn size, so i added below- pd.set_option('display.max_columns', None) pd.set_option('display.max_rows',None) – Nisha Budakoti Apr 28 '20 at 06:31
0

Try this:

from pandas.io.json import json_normalize
import ast

js = [{'attributes': [{'key': '*anchor save set time', 'values': ['158']}, {'key': '*backup start time', 'values': ['158790']}, {'key': '*policy action name', 'values': ['Server db backup: 158790']}, {'key': '*policy name', 'values': ['Server Protection: 158860']}, {'key': '*policy protection period', 'values': ['0: 1580']}, {'key': '*policy workflow name', 'values': ['Server backup: 15860']}, {'key': '*ss clone retention', 'values': ['   158760:      150:   186']}, {'key': 'group', 'values': ['Server Protection']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T10:01:00+08:00', 'clientHostname': 'server.com', 'clientId': '52d79c1', 'completionTime': '2020-04-27T10:01:00+08:00', 'creationTime': '2020-04-21T10:01:00+08:00', 'fileCount': 2, 'id': '2b2d925f9c1', 'instances': [{'clone': False, 'id': '1587952860', 'status': 'Recoverable', 'volumeIds': ['444']}], 'level': '1', 'links': [{'href': 'https://server.com/backups/2b4c1', 'rel': 'item'}], 'name': 'index:b9c9dc1', 'retentionTime': '2020-05-18T23:59:59+08:00', 'saveTime': '2020-04-27T10:01:00+08:00', 'shortId': '1151745244', 'size': {'unit': 'Byte', 'value': 408}, 'type': 'File'}, {'attributes': [{'key': '*anchor save set time', 'values': ['1585706503']}, {'key': '*backup start time', 'values': ['1587952855']}, {'key': '*policy action name', 'values': ['Server db backup: 15855']}, {'key': '*policy name', 'values': ['Server Protection: 158795']}, {'key': '*policy protection period', 'values': ['0: 1585']}, {'key': '*policy workflow name', 'values': ['Server backup: 1587855']}, {'key': '*ss clone retention', 'values': ['          158755:     1587952855:   186']}, {'key': 'group', 'values': ['Server Protection']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-18T10:00:55+08:00', 'clientHostname': 'server.com', 'clientId': '52d7ae9c1', 'completionTime': '2020-04-27T10:00:55+08:00', 'creationTime': '2020-04-27T10:00:55+08:00', 'fileCount': 2, 'id': '0b8ae9c1', 'instances': [{'clone': False, 'id': '1', 'status': 'Recoverable', 'volumeIds': ['446929344']}], 'level': '1', 'links': [{'href': 'https://server.com/backups/0b8a', 'rel': 'item'}], 'name': 'index:ec3df9c1', 'retentionTime': '2020-05-18T23:59:59+08:00', 'saveTime': '2020-04-27T10:00:55+08:00', 'shortId': '1168522455', 'size': {'unit': 'Byte', 'value': 408}, 'type': 'File'}]

df = json_normalize(js)
df[['clone','id','status','volumeIds']] = df.instances.apply(lambda x: pd.Series(ast.literal_eval(str(x[0]))))

df:

attributes  browseTime  clientHostname  clientId    completionTime  creationTime    fileCount   id  instances   level   ... name    retentionTime   saveTime    shortId type    size.unit   size.value  clone   status  volumeIds
0   [{'key': '*anchor save set time', 'values': ['...   2020-05-19T10:01:00+08:00   server.com  52d79c1 2020-04-27T10:01:00+08:00   2020-04-21T10:01:00+08:00   2   1587952860  [{'clone': False, 'id': '1587952860', 'status'...   1   ... index:b9c9dc1   2020-05-18T23:59:59+08:00   2020-04-27T10:01:00+08:00   1151745244  File    Byte    408 False   Recoverable [444]
1   [{'key': '*anchor save set time', 'values': ['...   2020-05-18T10:00:55+08:00   server.com  52d7ae9c1   2020-04-27T10:00:55+08:00   2020-04-27T10:00:55+08:00   2   1   [{'clone': False, 'id': '1', 'status': 'Recove...   1   ... index:ec3df9c1  2020-05-18T23:59:59+08:00   2020-04-27T10:00:55+08:00   1168522455  File    Byte    408 False   Recoverable [446929344]

df[['clientHostname','clientId','completionTime','creationTime','status','retentionTime','fileCount']]

clientHostname  clientId    completionTime              creationTime                status     retentionTime                fileCount
0   server.com  52d79c1     2020-04-27T10:01:00+08:00   2020-04-21T10:01:00+08:00   Recoverable 2020-05-18T23:59:59+08:00   2
1   server.com  52d7ae9c1   2020-04-27T10:00:55+08:00   2020-04-27T10:00:55+08:00   Recoverable 2020-05-18T23:59:59+08:00   2
Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • Thanks @Pygirl.. I am running the same code but getting the below output. ` attributes ... volumeIds 0 [{'key': '*anchor save set time', 'values': ['... ... [444] 1 [{'key': '*anchor save set time', 'values': ['... ... [446929344] [2 rows x 21 columns]` – Nisha Budakoti Apr 28 '20 at 04:42
  • You will get that only. 2 rows × 21 columns . Then select the columns you want to have `df[['clientHostname','clientId','completionTime','creationTime','status','retentionTime','fileCount']] ` – Pygirl Apr 28 '20 at 04:56
  • do I have to import pandas as pd because in the above code it is giving pd is not defined, Also if i define it also getting the same output for df ` attributes ... volumeIds 0 [{'key': '*anchor save set time', 'values': ['... ... [444] 1 [{'key': '*anchor save set time', 'values': ['... ... [446929344] [2 rows x 21 columns]` – Nisha Budakoti Apr 28 '20 at 05:17
  • screenshot please. We can't guess what you are doing. – Pygirl Apr 28 '20 at 05:20
  • How can i add screenshot? – Nisha Budakoti Apr 28 '20 at 05:26
  • upload somewhere online and send us the link. – Pygirl Apr 28 '20 at 05:28