Groupby with lambda function and multiple columns

Question

I have a dataframe containing sales data for real estate parcels. I am trying to groupby parcel number then for each parcel number see the most recent sale and the second most recent sale by date along with the corresponding sales price for those two dates.

df = 
parcel  date            amount
101469  5/29/2015 0:00  513000
101469  4/25/2017 0:00  570000
101470  1/6/1995 0:00   75000
101470  8/15/1995 0:00  385000
101470  12/31/2001 0:00 417500


df_grouped = df.groupby("parcel").agg({'date': lambda grp: [grp.nlargest(1).iloc[-1], grp.nlargest(2).iloc[-1]
]})

The current code properly groups the data by parcel and also determines the most recent and second most recent sale dates. However, I am unable to add in the corresponding sales price for each.

Here is generally the expected result I'd like to see. One grouped by line per parcel that shows the most recent sale, second most recent sale, most recent sale amount, second most recent sale amount:

please show us your expected result. Thanks. – jose_bacoy Jun 06 '19 at 16:40 — jose_bacoy, Jun 06 '19 at 16:40
I know there's a dup somewhere – piRSquared Jun 06 '19 at 16:54 — piRSquared, Jun 06 '19 at 16:54

score 0 · Answer 1 · edited Jun 08 '19 at 01:52

Using these these steps:

create a df1 using sort_values, groupby and pick top 2 rows of each group
add key columns to df1 using cumcount (convert it to str)
set_index and unstack to the desired output
use multiindex map to pretty-up columns to desired column names

df1 = df.sort_values('date', ascending=False).groupby('parcel').head(2)
df1['key'] = df1.groupby(['parcel']).parcel.cumcount().add(1).astype(str)
df1 =  df1.set_index(['parcel', 'key']).unstack()
df1.columns = df1.columns.map('_'.join)

Out[1268]:
           date_1     date_2  amount_1  amount_2
parcel
101469 2017-04-25 2015-05-29    570000    513000
101470 2001-12-31 1995-08-15    417500    385000

score 0 · Answer 2 · answered Jun 06 '19 at 23:35

Solved. Original solution here: Apply multiple functions to multiple groupby columns

def f(x):
        d = {}
        d['most_recent_sale'] = x["date"].nlargest(1).iloc[-1]
        d['second_most_recent_sale'] = x["date"].nlargest(2).iloc[-1]
        d['most_recent_price'] = x.loc[x["date"] == d["most_recent_sale"], "amt_Price"].values[0]
        d['second_most_recent_price'] = x.loc[x["date"] == d["second_most_recent_sale"], "amt_Price"].values[0]

        return pd.Series(d, index=['most_recent_sale', 'second_most_recent_sale', 'most_recent_price', 'second_most_recent_price'])

    df_grouped = df.groupby("id_Pid").apply(f)

Groupby with lambda function and multiple columns

2 Answers2