4

I am trying to iterate through json files in a folder and append them all into one pandas dataframe.

If I say

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os


directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)

df_all = pd.DataFrame()
with open("building_data/rooms.json") as file:
  data = json.load(file)
df = json_normalize(data['rooms'])
df_y.append(df, ignore_index=True)

I get a dataframe with the data from the one file. If I turn this thinking into a for loop, I have tried

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os

directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)

df_all = pd.DataFrame()
for file in os.listdir(directory):
    with open(directory_in_str+'/'+filename) as file:
        data = json.load(file)
    df = json_normalize(data['rooms'])
    df_all.append(df, ignore_index=True)

print(df_all)

This returns an empty dataframe. Does anyone know why this is happening? If I print df before appending it, it prints the correct values, so I am not sure why it is not appending.

Thank you!

Thérèse Mills
  • 47
  • 1
  • 1
  • 4
  • Was facing the same issue. This is because append does not work inplace. `df_all = df_all.append(df, ignore_index=True)` will work. However, it's worth noting that this is not a time-efficient solution for large data frames: https://stackoverflow.com/questions/36489576/why-does-concatenation-of-dataframes-get-exponentially-slower/36489724#36489724 – dshgna Jan 15 '21 at 18:02

2 Answers2

3

Instead of append next DataFrame I would try to join them like that:

if df_all.empty:
    df_all = df
else:
    df_all = df_all.join(df)

When joining DataFrames, you can specify on what they should be joined - on index or on specific (key) column, as well as how (default option is similar to appending - 'left').

Here's docs about pandas.DataFrame.join.

Stoockbroker
  • 381
  • 2
  • 6
1

In these instances I load everything from json into a list by appending each file's returned dict onto that list. Then I pass the list to pandas.DataFrame.from_records (docs)

In this case the source would become something like...

import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import os

directory_in_str = 'building_data'
directory = os.fsencode(directory_in_str)

json_data = []
for file in os.listdir(directory):
    with open(directory_in_str+'/'+filename) as file:
        data = json.load(file)
    json_data.append( json_normalize(data['rooms']) )

df_all = pandas.DataFrame.from_records( json_data )

print(df_all)
jxramos
  • 7,356
  • 6
  • 57
  • 105