1

I have a two-fold issue and looking for clues as to how to approach it.

I have a json file that is formatted as such:

{
    "code": 2000,
    "data": {
        "1": {
            "attribute1": 40,
            "attribute2": 1.4,
            "attribute3": 5.2,
            "attribute4": 124
            "attribute5": "65.53%"
        },        
        "94": {
            "attribute1": 10,
            "attribute2": 4.4,
            "attribute3": 2.2,
            "attribute4": 12
            "attribute5": "45.53%"
        },
        "96": {
            "attribute1": 17,
            "attribute2": 9.64,
            "attribute3": 5.2,
            "attribute4": 62
            "attribute5": "51.53%"
        }
    },
    "message": "SUCCESS"
}

My goals are to:

  1. I would first like to sort the data by any of the attributes.
  2. There are around 100 of these, I would like to grab the top 5 (depending on how they are sorted), then...
  3. Output the data in a table e.g.:
These are sorted by: attribute5
---
attribute1 | attribute2 | attribute3 | attribute4 | attribute5
40 |1.4 |5.2|124|65.53%
17 |9.64|5.2|62 |51.53%
10 |4.4 |2.2|12 |45.53%

*also, attribute5 above is a string value

Admittedly, my knowledge here is very limited. I attempted to mimick the method used here: python sort list of json by value

I managed to open the file and I can extract the key values from a sample row:

import json

jsonfile = path-to-my-file.json

with open(jsonfile) as j:
  data=json.load(j)
  k = data["data"]["1"].keys()
  print(k)

total=data["data"]
for row in total:
    v = data["data"][str(row)].values()
    print(v)

this outputs:

dict_keys(['attribute1', 'attribute2', 'attribute3', 'attribute4', 'attribute5'])
dict_values([1, 40, 1.4, 5.2, 124, '65.53%'])
dict_values([94, 10, 4.4, 2.2, 12, '45.53%'])
dict_values([96, 17, 9.64, 5.2, 62, '51.53%'])

Any point in the right direction would be GREATLY appreciated.

Thanks!

p3hndrx
  • 103
  • 9

2 Answers2

2

If you don't mind using pandas you could do it like this

import pandas as pd
rows = [v for k,v in data["data"].items()]

df = pd.DataFrame(rows)

# then to get the top 5 values by attribute can choose either ascending
# or descending with the ascending keyword and head prints the top 5 rows

df.sort_values('attribute1', ascending=True).head()

This will allow you to sort by any attribute you need at any time and print out a table.

Which will produce output like this depending on what you sort by

   attribute1  attribute2  attribute3  attribute4 attribute5
0          40        1.40         5.2         124     65.53%
1          10        4.40         2.2          12     45.53%
2          17        9.64         5.2          62     51.53%
Matthew Barlowe
  • 2,229
  • 1
  • 14
  • 24
  • This meets all of the objectives and is very simple. I am unfamiliar with the pandas library. If I am to be working with JSON data regularly, is pandas the most sustainable/straightforward method? – p3hndrx Jun 04 '21 at 23:59
  • 2
    @p3hndrx It depends its often the best way to create tabular data (and more) out of various data structures in python. If you plan on doing any sort of data analysis on the data after you get it then I would definitely suggest using `pandas` as that is what it was designed for – Matthew Barlowe Jun 05 '21 at 00:53
2

I'll leave this answer here in case you don't want to use pandas but the answer from @MatthewBarlowe is way less complicated and I recommend that.

For sorting by a specific attribute, this should work:

import json

SORT_BY = "attribute4"

with open("test.json") as j:
    data = json.load(j)

items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))

Now, sorted_keys is a list of the keys in order of the attribute they were sorted by.

Then, to print this as a table, I used the tabulate library. The final code for me looked like this:

from tabulate import tabulate
import json

SORT_BY = "attribute4"

with open("test.json") as j:
    data = json.load(j)

items = data["data"]
sorted_keys = list(sorted(items, key=lambda key: items[key][SORT_BY], reverse=True))

print(f"\nSorted by: {SORT_BY}")
print(
    tabulate(
        [
            [sorted_keys[i], *items[sorted_keys[i]].values()]
            for i, _ in enumerate(items)
        ],
        headers=["Column", *items["1"].keys()],
    )
)

When sorting by 'attribute5', this outputs:

Sorted by: attribute5
  Column    attribute1    attribute2    attribute3    attribute4  attribute5
--------  ------------  ------------  ------------  ------------  ------------
       1            40          1.4            5.2           124  65.53%
      96            17          9.64           5.2            62  51.53%
      94            10          4.4            2.2            12  45.53%
Rolv Apneseth
  • 2,078
  • 2
  • 7
  • 19