0

I am currently having a 1000 JSON's in the following format, each for a single employee:

        "A": "A_text",
        "B": "Datetime stamp of record",
        "ID": "123",
        "FeatureList": {
            "Salary": 100000,
            "Age": 45,
            "Work Ex": 15,
                       }
         }

My goal is to recursively concatenate these files into one df (see below):

enter image description here

In my current solution: I can recursively add all files after formatting with:

rootdir ='/folderpath/filename'
all_files = Path(rootdir).rglob('*.json')

I am able to read the file and transpose it with below:

df = pd.read_json('data.json')
df = df.transpose()

But the array buffer look-alike "FeatureList" is creating distorted orientation if I drop or create new columns.

enter image description here

Any advise on my approach would really help. Thanks

Abhi
  • 123
  • 1
  • 11

1 Answers1

1

If that's your json, then you can use json_normalize:

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data).drop(columns=['A']).rename(columns={'B': 'Date'})
print(df)

                       Date   ID  FeatureList.Salary  FeatureList.Age  FeatureList.Work Ex
0  Datetime stamp of record  123              100000               45                   15
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • Using Python 3.6, Getting AttributeError: module 'pandas' has no attribute 'json_normalize' – Abhi Jul 13 '20 at 18:59
  • What version of pandas are you using? – NYC Coder Jul 13 '20 at 19:07
  • '0.25.3' @NYC Coder – Abhi Jul 13 '20 at 22:34
  • 1
    You should upgrade to 1.0.5 – NYC Coder Jul 13 '20 at 22:52
  • I upgraded to 1.0.5. In place of data I am using the path to JSON. Still getting error: AttributeError: 'str' object has no attribute 'values' – Abhi Jul 14 '20 at 19:19
  • data is a dict which is loaded from a file: check my updated answer – NYC Coder Jul 14 '20 at 19:23
  • Any idea if it is possible to recursively open and read as I have 50 folders each with 1000 files. Currently for CSVs I use rootdir ='/folderpath/filename' all_files = Path(rootdir).rglob('*.csv') . It takes a lot of SWAP memory when I am using good ol for loops. – Abhi Jul 14 '20 at 19:30