4

This is what one row of themesdf looks like:

[{'code': '1', 'name': 'Economic management'},
 {'code': '6', 'name': 'Social protection and risk management'}]

I want to normalize each row and add it to a newdf. This is what I have right now:

import pandas as pd

themesdf = json_df['mjtheme_namecode']
newdf = pd.DataFrame()
%timeit() 
for row in themesdf:
    for item in row:
        newdf.append(json_normalize(item, 'name'))
newdf

After printing out newdf, it comes it with nothing. My ultimate goal with this data is to get the top ten major project themes (column 'name').

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
vnguyen56
  • 55
  • 6
  • 1
    `newdf = newdf.append(json_normalize(item, 'name'))` As the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html) `.append()` returns a new object. So you need to update the dataframe. – harvpan Jul 17 '18 at 19:53
  • 2
    **[Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.](https://stackoverflow.com/a/36489724/1422451)** – Parfait Jul 17 '18 at 20:55

1 Answers1

0

I recently worked on something similar. My solution was to explode the list so each json has its own row. Then normalize with pd.json_normalize(). That creates a new df which needs to be rejoined with the original table.

Since you didn't provide any test data here's my best guess as to what works:

import pandas as pd

# explode list column
explodedDf = themesDf.explode('mjtheme_namecode')

# normalize that column into a new df
normalizedDf = pd.json_normalize(explodedDf['mjtheme_namecode'])

# (optional) you may want to drop the original column
themesDf = themesDf.drop('mjtheme_namecode', axis = 1)

# join on index (default) with original df
newDf = themesDf.join(normalizedDf)
kentkr
  • 148
  • 11