1

I am a newbie in pandas and seeking advice if this is a possible bug?

Dataframe with non unique datetime index. Col1 is a group variable, col2 is values.

i want to resample the hourly values to years and grouping by the group variable. i do this with this command

df_resample = df.groupby('col1').resample('Y').mean() This works fine and creates a multiindex of col1 and the datetimeindeks, where col1 is now NOT a column in the dataframe

How ever if i change mean() to max() this is not the case. Then col1 is part of the multiindex, but the column is still present in the dataframe. Isnt this a bug?

Sorry, but i dont know how to present dummy data as a dataframe in this post?

Edit: code example:

from datetime import datetime, timedelta
import pandas as pd

data = {'category':['A', 'B', 'C'],
        'value_hour':[1,2,3]}
days = pd.date_range(datetime.now(), datetime.now() + timedelta(2), freq='D')

df = pd.DataFrame(data, index=days)

df_mean = df.groupby('category').resample('Y').mean()
df_max = df.groupby('category').resample('Y').max()
print(df_mean, df_max)
                        
category                value_hour              
A        2021-12-31         1.0
B        2021-12-31         2.0
C        2021-12-31         3.0     

category              category  value_hour                           
A        2021-12-31        A           1
B        2021-12-31        B           2
C        2021-12-31        C           3

Trying to drop the category column from df_max gives an KeyError

df_max.drop('category')

File "C:\Users\mav\Anaconda3\envs\EWDpy\lib\site-packages\pandas\core\indexes\base.py", line 3363, in get_loc
raise KeyError(key) from err

KeyError: 'category'
WhiteDear
  • 25
  • 5

1 Answers1

0

Concerning the KeyError: the problem is that you are trying to drop the "category" row instead of the column. When using drop to drop the columns you should add axis = 1 as in the following code:

df_max.drop('category', axis=1)

axis=1 indicates you are looking at the columns

Isy89
  • 179
  • 8
  • ah yes, thank you. that solves my problem - but shouldn't the column had been dropped as it is when using .mean() - and therefore a bug? – WhiteDear Sep 30 '21 at 09:26
  • Hi, yes, I agree, this is something strange. I also tried to run it with the debugger in PyCharm. When I run it step by step everything is fine and the output is without the category column, while if I let the script run, it prints the dataframe with the category column ... could you try to check whether you can reproduce this behavior? – Isy89 Sep 30 '21 at 21:23
  • I Can reproduce it in a project I am working on. I want to reset the index afterwards but can’t because of the category column. – WhiteDear Oct 02 '21 at 19:40