Recursive addition/formating of JSONs in Python

Question

I am currently having a 1000 JSON's in the following format, each for a single employee:

        "A": "A_text",
        "B": "Datetime stamp of record",
        "ID": "123",
        "FeatureList": {
            "Salary": 100000,
            "Age": 45,
            "Work Ex": 15,
                       }
         }

My goal is to recursively concatenate these files into one df (see below):

In my current solution: I can recursively add all files after formatting with:

rootdir ='/folderpath/filename'
all_files = Path(rootdir).rglob('*.json')

I am able to read the file and transpose it with below:

df = pd.read_json('data.json')
df = df.transpose()

But the array buffer look-alike "FeatureList" is creating distorted orientation if I drop or create new columns.

Any advise on my approach would really help. Thanks

This link will help solve your problem - https://stackoverflow.com/a/21266043/4285029 — Lambo, Jul 13 '20 at 17:19
@Josh Laird, edited the question, adding the distorted view. — Abhi, Jul 13 '20 at 17:22

NYC Coder · Accepted Answer · 2020-07-14T19:23:22.223

1

If that's your json, then you can use json_normalize:

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data).drop(columns=['A']).rename(columns={'B': 'Date'})
print(df)

                       Date   ID  FeatureList.Salary  FeatureList.Age  FeatureList.Work Ex
0  Datetime stamp of record  123              100000               45                   15

edited Jul 14 '20 at 19:23

answered Jul 13 '20 at 17:18

NYC Coder

7,424
2
11
24

Using Python 3.6, Getting AttributeError: module 'pandas' has no attribute 'json_normalize' – Abhi Jul 13 '20 at 18:59
What version of pandas are you using? – NYC Coder Jul 13 '20 at 19:07
'0.25.3' @NYC Coder – Abhi Jul 13 '20 at 22:34
1

You should upgrade to 1.0.5 – NYC Coder Jul 13 '20 at 22:52
I upgraded to 1.0.5. In place of data I am using the path to JSON. Still getting error: AttributeError: 'str' object has no attribute 'values' – Abhi Jul 14 '20 at 19:19
data is a dict which is loaded from a file: check my updated answer – NYC Coder Jul 14 '20 at 19:23
Any idea if it is possible to recursively open and read as I have 50 folders each with 1000 files. Currently for CSVs I use rootdir ='/folderpath/filename' all_files = Path(rootdir).rglob('*.csv') . It takes a lot of SWAP memory when I am using good ol for loops. – Abhi Jul 14 '20 at 19:30

Recursive addition/formating of JSONs in Python

1 Answers1