6

I have a data frame that has a 3 columns. Time represents every day of the month for various months. what I am trying to do is get the 'Count' value per day and average it per each month, and do this for each country. The output must be in the form of a data frame.

Curent data:

    Time    Country Count
 2017-01-01    us   7827
 2017-01-02    us   7748
 2017-01-03    us   7653
 ..
 ..
 2017-01-30    us   5432
 2017-01-31    us   2942
 2017-01-01    us   5829
 2017-01-02    ca   9843
 2017-01-03    ca   7845
 ..
 ..
 2017-01-30    ca   8654
 2017-01-31    ca   8534

Desire output (dummy data, numbers are not representative of the DF above):

    Time       Country   Monthly Average
 Jan 2017      us          6873
 Feb 2017      us          8875
 ..
 .. 
 Nov 2017      us          9614
 Dec 2017      us          2475
 Jan 2017      ca          1878
 Feb 2017      ca          4775
 ..
 .. 
 Nov 2017      ca          7643
 Dec 2017      ca          9441
Kishor
  • 450
  • 8
  • 11
David
  • 487
  • 2
  • 6
  • 18
  • You Count column is confusing. Is it a column of the original dataframe or a resulting column from a groupby? Also, am not sure why you need a count and average. You can directly aggregate by the mean. – skrubber Nov 12 '17 at 01:39
  • It should come from the original dataframe. Basically add all the values for each individual month, and then average them. – David Nov 12 '17 at 01:53

2 Answers2

5

I'd organize it like this:

df.groupby(
    [df.Time.dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')

       Time Country  Monthly Average
0  Feb 2017      ca             88.0
1  Feb 2017      us            105.0
2  Jan 2017      ca             85.0
3  Jan 2017      us             24.6
4  Mar 2017      ca             86.0
5  Mar 2017      us             54.0

If your 'Time' column wasn't already a datetime column, I'd do this:

df.groupby(
    [pd.to_datetime(df.Time).dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')

       Time Country  Monthly Average
0  Feb 2017      ca             88.0
1  Feb 2017      us            105.0
2  Jan 2017      ca             85.0
3  Jan 2017      us             24.6
4  Mar 2017      ca             86.0
5  Mar 2017      us             54.0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • for some reason when I use your second implementation I keep getting the following error: 'function' object has no attribute 'mean' Any ideas ? – David Nov 12 '17 at 02:47
  • 1
    You're using lower case `count` which is a function. I'll update post – piRSquared Nov 12 '17 at 02:48
1

Use pandas dt strftime to create a month-year column that you desire + groupby + mean. Used this dataframe:

Dated     country   num 
2017-01-01  us     12   
2017-01-02  us     12   
2017-02-02  us     134  
2017-02-03  us     76   
2017-03-30  us     54   
2017-01-31  us     29   
2017-01-01  us     58   
2017-01-02  us     12   
2017-02-02  ca     98   
2017-02-03  ca     78   
2017-03-30  ca     86   
2017-01-31  ca     85   

Then create a Month-Year column:

a['MonthYear']= a.Dated.dt.strftime('%b %Y')

Then, drop the Date column and aggregate by mean:

a.drop('Dated', axis=1).groupby(['MonthYear','country']).mean().rename(columns={'num':'Averaged'}).reset_index()

MonthYear      country  Averaged
Feb 2017    ca      88.0
Feb 2017    us      105.0
Jan 2017    ca      85.0
Jan 2017    us      24.6
Mar 2017        ca      86.0
Mar 2017        us      54.0

I retained the Dated column just in case.

skrubber
  • 1,095
  • 1
  • 9
  • 18