2

This question is an extension from a question I posted here a while ago. I'm trying to understand the accepted answer provided by @patrickjlong1 (thanks again), therefore I'm running the code step by step and checking the result.

I found it hard to fathom this part.

>>> df_initial
data                                                    seriesID
0   {'year': '2017', 'period': 'M12', 'periodName'...   SMS42000000000000001
1   {'year': '2017', 'period': 'M11', 'periodName'...   SMS42000000000000001
2   {'year': '2017', 'period': 'M10', 'periodName'...   SMS42000000000000001
3   {'year': '2017', 'period': 'M09', 'periodName'...   SMS42000000000000001
4   {'year': '2017', 'period': 'M08', 'periodName'...   SMS42000000000000001
5   {'year': '2017', 'period': 'M07', 'periodName'...   SMS42000000000000001

The element in each row of the first column is a dictionary and they all have common keys: 'year', 'period' etc. What I want to convert it to is:

    footnotes   period  periodName  value   year
0   {}           M12    December    6418025 2017
0   {}           M11    November    6418195 2017
0   {}           M10    October     6418284 2017
...

The solution provided by @patrickjlong1 is to convert the row one at a time and then append them all, which I understand as one dictionary can be converted to one dataframe:

    for i in range(0, len(df_initial)):
        df_row = pd.DataFrame(df_initial['data'][i])
        df_row['seriesID'] = series_col
        df = df.append(df_row, ignore_index=True)

My question is: is this the only way to convert the data like I wanted? If not, what are the other methods?

Thanks

jpp
  • 159,742
  • 34
  • 281
  • 339
Bowen Liu
  • 1,065
  • 1
  • 11
  • 24

1 Answers1

2

Avoid pd.DataFrame.append in a loop

I can't stress this enough. The pd.DataFrame.append method is expensive as it copies data unnecessarily. Putting this in a loop makes it n times more expensive.

Instead, you can feed a list of dictionaries to the pd.DataFrame constructor:

df = pd.DataFrame(df_initial['seriesID'].tolist())
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks for your reply. Would you care to elaborate a little bit more on your definition of "expensive" please? Do you mean that it will eat up more memory or is it more prone to error? Both methods give me what I want but the sorting of the results seem different. I will need to find time to compile your answer into the second method and compare the results. Thanks again – Bowen Liu Sep 27 '18 at 18:00
  • Oh, just a quick thing, the name of the targeted column is `data` instead of `seriesID` but that's no biggie. – Bowen Liu Sep 27 '18 at 18:01
  • @BowenLiu, Expensive means "costly in terms of performance", usually measured in time. Cheap, likewise, means efficient / performant. – jpp Sep 27 '18 at 21:04