0

I have a set of data that has several different columns, with daily data going back several years. The variable is the exact same for each column. I've calculated the daily, monthly, and yearly statistics for each column, and want to do the same, but combining all columns together to get one statistic for each day, month, and year rather than the several different ones I calculated before.

I've been using Pandas group by so far, using something like this:

sum_daily_files = daily_files.groupby(daily_files.Date.dt.day).sum()
sum_monthly_files = daily_files.groupby(daily_files.Date.dt.month).sum()
sum_yearly_files = daily_files.groupby(daily_files.Date.dt.year).sum()

Any suggestions on how I might go about using Pandas - or any other package - to combine the statistics together? Thanks so much!

edit

Here's a snippet of my dataframe:

Date                 site1  site2  site3  site4  site5  site6
2010-01-01 00:00:00      2      0      1      1      0      1
2010-01-02 00:00:00      7      5      1      3      1      1
2010-01-03 00:00:00      3      3      2      2      2      1
2010-01-04 00:00:00      0      0      0      0      0      0
2010-01-05 00:00:00      0      0      0      0      0      1

I just had to type it in because I was having trouble getting it over, so my apologies. Basically, it's six different sites from 2010 to 2019 that details how much snow (in inches) each site received on each day.

HokieWx
  • 37
  • 5
  • have a look on [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Anurag Dabas Jun 04 '21 at 18:03
  • 1
    Please, post a sample of your original dataframe. – Corralien Jun 04 '21 at 19:31
  • If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker Jun 04 '21 at 19:57
  • can you give us the expected output? – Billy Bonaros Jun 05 '21 at 09:11
  • Yeah, pretty much I just want it where every day, month, and year are summed for all columns. For example, Jan 1 would be 5 plus the totals seen on Jan 1 2011/2012/2013/2014/2015/2016. I can only get groupby to work individually (i.e., for each station), but want to see if I apply it once to get the sum/means at all stations, rather than just each station individually. – HokieWx Jun 07 '21 at 16:11

1 Answers1

1

(Your problem need to be clarify)

Is this what you want?

all_sum_daily_files = sum_daily_files.sum(axis=1)  # or daily_files.sum(axis=1)
all_sum_monthly_files = sum_monthly_files.sum(axis=1)
all_sum_yearly_files = sum_yearly_files.sum(axis=1)

If your data is daily, why calculate the daily sum, you can use directly daily_files.sum(axis=1).

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • That works! Yeah, sorry about the confusion. Pretty much, I know group by can calculate the daily/monthly/yearly totals for each site individually. I just wanted to see if I could get that for all stations combined, so instead of getting a new dataframe that has 6 different values for each day/month/year for each of the 6 stations, I'd only get one value for each day/month/year that is the total sum/mean of all the stations. – HokieWx Jun 07 '21 at 16:16