12

The dataframe is an empty df after query.when groupby,raise runtime waring,then get another empty dataframe with no columns.How to keep the columns?

df = pd.DataFrame(columns=["PlatformCategory","Platform","ResClassName","Amount"])
print df

result:

Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []

then groupby:

df = df.groupby(["PlatformCategory","Platform","ResClassName"]).sum()
df = df.reset_index(drop=False,inplace=True)
print df

result: sometimes is None sometime is empty dataframe

Empty DataFrame
Columns: []
Index: []

why empty dataframe has no columns.

runtimewaring:

/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: divide by zero encountered in log

if alpha + beta * ngroups < count * np.log(count):

/data/pyrun/lib/python2.7/site-packages/pandas/core/groupby.py:3672: RuntimeWarning: invalid value encountered in double_scalars
  if alpha + beta * ngroups < count * np.log(count):
cs95
  • 379,657
  • 97
  • 704
  • 746
user2890059
  • 145
  • 1
  • 6

2 Answers2

6

You need as_index=False and group_keys=False:

df = df.groupby(["PlatformCategory","Platform","ResClassName"], as_index=False).count()
df

Empty DataFrame
Columns: [PlatformCategory, Platform, ResClassName, Amount]
Index: []

No need to reset your index afterwards.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • This was exactly what I was also looking for. Thanks a lot! – user2890059 Sep 07 '17 at 07:37
  • This only works for empty dataframe.In no empty dataframe case,it does't work.When change count() to sum() ,it does't work too.I want to get the sum compatible two cases .Have you some advice? – user2890059 Sep 07 '17 at 08:54
  • @user2890059 Share some data... in your question? – cs95 Sep 07 '17 at 08:56
  • @user2890059 If you are trying to find the sum of some particular column, then call sum() on that column. – cs95 Sep 07 '17 at 08:59
  • change to sum,get empty dataframe without columns – user2890059 Sep 07 '17 at 11:52
  • @user2890059 Interestingly, I don't think it's possible to do this with sum, because sum condenses all rows in an aggregation attempt. Try it with actual data and you'll understand. – cs95 Sep 07 '17 at 11:54
1

Some code that works the same for .sum() whether or not the dataframe is empty:

def groupby_sum(df, groupby_cols):
    groupby = df.groupby(groupby_cols, as_index=False)
    summed = groupby.sum()
    return (groupby.count() if summed.empty else summed).set_index(groupby_cols)

df = groupby_sum(df, ["PlatformCategory", "Platform", "ResClassName"])
rleelr
  • 1,854
  • 17
  • 26