0

I have a pandas df that looks like this but has 20 columns. I am wanting to write a loop that iterates through all of the columns and normalizes them. I have trouble understanding loops.

b = json_data
b1 = pd.json_normalize(b['financial'])

for column in b1:
    pd.json_normalize(b1[column])
    print()

This is what the loop or function will do without having to write 20 lines of the same code and n+1 of the index.

b1a = pd.json_normalize(b1[0])
b1b = pd.json_normalize(b1[1])
b1c = pd.json_normalize(b1[2])
0 1 2
{'type.coding': [{'code': 'https://bluebutton.... {'type.coding': [{'code': 'https://bluebutton.... {'type.coding': [{'code': 'https://bluebutton....

Expected output is this:

type.coding usedMoney.currency usedMoney.value
[{'code': 'https://bluebutton.... [{'code': 'https://bluebutton.... [{'code': 'https://bluebutton....
user17629522
  • 105
  • 8

1 Answers1

0

It sounds like you are looking to collect the individual dataframes from each flattened column from your dataframe b1? Like so:

flattened_column_dfs = []
for column in b1:
    flattened_column_dfs.append(pd.json_normalize(b1[column]))

Then you have each dataframe as an item in flattened_column_dfs where the index of the list corresponds to the column. You can loop on the list and access each one as a dataframe to process further or use.
Or access specific ones.
Example: flattened_column_dfs[0] corresponds to the first column, flattened_column_dfs[1] corresponds to the second column, etc.

If you prefer a dictionary with the position of the original columns as the keys:

flattened_column_dfs_as_dictionary = {}
for indx, column in enumerate(b1):
    flattened_column_dfs_as_dictionary[indx] = pd.json_normalize(b1[column])

Since your column names in b1 appear to already be integers of the zero-indexed position you can shorten that dictionary making to:

flattened_column_dfs_as_dictionary = {}
for column in b1:
    flattened_column_dfs_as_dictionary[column] = pd.json_normalize(b1[column])

Dictionaries offer a few more options for iterating and accessing the individual keys and associated values.


You could use letters as the keys for the dictionary if you prefer. It's less convenient and you're be limited to 26 unless you add in making alternate increments after the first 26, such as AA for 27th and, and AB for 28th, etc.. Fortunately, there's code to do that here, where key with integer zero is column A. Here's how you'd do that after-the-fact once flattened_column_dfs_as_dictionary was made with the integers as the keys using code above:

import string

def n2a(n,b=string.ascii_uppercase):
   d, m = divmod(n,len(b))
   return n2a(d-1,b)+b[m] if d else b[m]

flattened_column_dfs_as_dictionary = {n2a(k):v for k,v in flattened_column_dfs_as_dictionary.items()}

Of course, you could use the same n2a function to do it at the time of making the keys for the dictionary initially.

Wayne
  • 6,607
  • 8
  • 36
  • 93