I've been working with pandas a little bit now, but I'm really getting my feet wet in the group by function.
I have the following function defined, which ultimately sorts and assigns values to new columns R, F, M, and RFM:
def get_rfm(dataframe):
dfr=dataframe.sort('last_order_date', ascending=True)
get_var(dfr.R)
dff=dfr.sort('number_of_orders', ascending=True)
get_var(dff.F)
dfm=dff.sort('total_price',ascending=True)
get_var(dfm.M)
dfm.RFM[:]=dfm['R']+dfm['M']+dfm['F']
dfrfm=dfm.sort('RFM', ascending=True)
print(dfrfm.info())
return dfrfm
I run this function on my pandas dataframe, and get what looks like the expected results. I return it into a new df, which I then run some statistics on.
What I now want to do is run a group by function on the dataframe, grouping them by one of the other columns, and perform this analysis on the subgroup. I try
df.groupby('size_of_business').apply(get_rfm)
But the results are not what I expected. I am returned a Dataframe that seems to be multiIndexed
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 57196 entries, ( Did Not Answer, 67103) to (More than 10 people, 5617)
Data columns (total 11 columns):
which is then followed by the list of columns. The first parts of the multiindex should be the names i grouped the dataframe by, followed by what looks to be the index.
I thought apply treated each group as a sub-dataframe, which i can then manipulate and then return. I believe my understanding of the structure is flawed, and I've had trouble finding anything to help correct myself.