0

I have a dataframe which is essentially daily imports for each country in a given area. I need to make a new dataframe for each country.

Example;

Spain = df[['Date', 'Spain']]
Spain['Date']= pd.to_datetime(Spain['Date']) 
Spain = Spain.groupby([Spain.Date.dt.year,Spain.Date.dt.month]).mean()
Spain = Spain.unstack()
Spain = Spain/1745
Spain = Spain.round()

I am not very experienced with loops but is there any way I could create a list of all the countries and make a loop that would loop through that list to save me from writing the above code for each of the 100+ countries?

Any help much appreciated.

spcol
  • 437
  • 4
  • 15

2 Answers2

1

First rule in pandas is try dont loop.

I think you need DataFrame.melt for unpivot first:

df1 = df.melt('Date', var_name='country', value_name='val')
df1['Date']= pd.to_datetime(df1['Date']) 
df2 = (df1.groupby(['country', df1.Date.dt.year,df1.Date.dt.month])['val']
           .mean()
           .unstack()
           .div(1745)
           .round())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for answer, is this using Python3? I'm on Python2 and getting the following error; "AttributeError: 'DataFrame' object has no attribute 'melt'" – spcol Mar 24 '21 at 15:37
  • @spcol Use `df1 = pd.melt(df, 'Date', var_name='country', value_name='val')` – jezrael Mar 24 '21 at 16:51
1

I have never used pandas so I will try an alternative answer:

You can make code that will write all of the code for you with the countries specified in a list. For example:

country = ['Spain', 'Italy']
code = ''

for item in country:
    code += f"{item} = df[['Date', '{item}']]" + '\n'
    code += f"{item}['Date'] = pd.to_datetime({item}['Date'])" + '\n'
    code += f"{item} = {item}.groupby([{item}.Date.dt.year, {item}.Date.dt.month]).mean()" + '\n'
    code += f"{item} = {item}.unstack()" + '\n'
    code += f"{item} = {item}/1745" + '\n'
    code += f"{item} = {item}.round()" + '\n\n'

When you go to print(code) you will get the following:

Spain = df[['Date', 'Spain']]
Spain['Date'] = pd.to_datetime(Spain['Date'])
Spain = Spain.groupby([Spain.Date.dt.year, Spain.Date.dt.month]).mean()
Spain = Spain.unstack()
Spain = Spain/1745
Spain = Spain.round()

Italy = df[['Date', 'Italy']]
Italy['Date'] = pd.to_datetime(Italy['Date'])
Italy = Italy.groupby([Italy.Date.dt.year, Italy.Date.dt.month]).mean()
Italy = Italy.unstack()
Italy = Italy/1745
Italy = Italy.round()

Which you can then copy/paste to your actual program.

Cambuchi
  • 102
  • 1
  • 9