group by with resample has column is being duplicated

Question

Background: I have covid data that I aggregated to the state level. Then I aggregated my days to the week level (this works). However, when I run the day to week logic with almost exactly the same data just at the county level I get an error. More specifically, I'm getting the same column in the index and in the data.

The left is the rolled-up state data and right rolled up county data...

Here is the state data code that is working...

df_covid_data = df_covid_data.groupby("State").resample('W-SAT', label='right', closed = 'right', on='date').sum().sort_values(by=['State','date'])

And here is the county code (basically identical) that is not working (note if I uncomment the end I get an error, because countyFIPS is included twice). I tried removing the nan columns and this did not help...

df_covid_data_c = df_covid_data_c.groupby("countyFIPS").resample('W-SAT', label='right', closed = 'right', on='date').sum()#.sort_values(by=['countyFIPS','date'])

Here are the outputs...

PS to bring in the raw data...

COVID_FILE =   
    'covid_confirmed_usafacts' #https://usafacts.org/visualizations/coronavirus-covid-19-spread-map grab confirmed cases

df_cum_covid_data=pd.read_csv(DATA_PATH+COVID_FILE+'.csv', sep=',').dropna(axis=1, how='all').dropna(axis=0, how='all')#read in covid data

It's hard to impossible to help you with the data given in screenshots. Have a look at how you can provide a reproducible example: https://stackoverflow.com/a/20159305/463796 — w-m, Oct 06 '21 at 17:25

score 0 · Answer 1 · answered Oct 06 '21 at 18:22

0

I found the issue was that the county id (countyFISP) was coming in as a float and this was messing up the group by function. If I cast it as in integer on the way in then the code ran.

answered Oct 06 '21 at 18:22

Jonathan Hay

195
11

group by with resample has column is being duplicated

1 Answers1