I'm trying to write some code that opens up about 900 nested dictionaries with roughly 99% similar content (very large) and store the value of each key in a list named after the key. For example, if I had two dictionaries: {data=37} {data=74}
I would want to combine those two values into a list named data
that outputs the following [37,74]
Here is the code I'm currently using to accomplish this:
import pandas as pd
df = pd.read_csv("/Users/---.csv")
i=True
def sort(d):
for k, v in d.items():
if isinstance(v, dict):
sort(v)
else:
global i
if i==True:
print("{0} : {1}".format(k, v))
setattr(sys.modules[__name__], k, [v])
i=False
else:
print("{0} : {1}".format(k, v))
globals()["{}".format(k)].append(v)
for i in df['file_num']:
with open("/Users/--/allDAFs{}.json".format(i)) as f:
data=json.load(f)
sort(data)
The problem with this is two fold: a. There are some duplicates and I'm not sure why. There are 1400 values for some key when there are only 900 files. b. I can't link these to a file_num. As you can see I'm sorting through these using file_num and I'd like to link each value to the file_num it came from.
I know I may not be doing this the best way possible so any insight or advice would be greatly appreciated.
EDIT: This is how I need the end result to look like, preferably in a pandas DataFrame: