-1

I have a dataframe containing sales data for real estate parcels. I am trying to groupby parcel number then for each parcel number see the most recent sale and the second most recent sale by date along with the corresponding sales price for those two dates.

df = 
parcel  date            amount
101469  5/29/2015 0:00  513000
101469  4/25/2017 0:00  570000
101470  1/6/1995 0:00   75000
101470  8/15/1995 0:00  385000
101470  12/31/2001 0:00 417500


df_grouped = df.groupby("parcel").agg({'date': lambda grp: [grp.nlargest(1).iloc[-1], grp.nlargest(2).iloc[-1]
]})

The current code properly groups the data by parcel and also determines the most recent and second most recent sale dates. However, I am unable to add in the corresponding sales price for each.

Here is generally the expected result I'd like to see. One grouped by line per parcel that shows the most recent sale, second most recent sale, most recent sale amount, second most recent sale amount: enter image description here

David Seroy
  • 179
  • 1
  • 14

2 Answers2

0

Using these these steps:

  • create a df1 using sort_values, groupby and pick top 2 rows of each group
  • add key columns to df1 using cumcount (convert it to str)
  • set_index and unstack to the desired output
  • use multiindex map to pretty-up columns to desired column names
df1 = df.sort_values('date', ascending=False).groupby('parcel').head(2)
df1['key'] = df1.groupby(['parcel']).parcel.cumcount().add(1).astype(str)
df1 =  df1.set_index(['parcel', 'key']).unstack()
df1.columns = df1.columns.map('_'.join)

Out[1268]:
           date_1     date_2  amount_1  amount_2
parcel
101469 2017-04-25 2015-05-29    570000    513000
101470 2001-12-31 1995-08-15    417500    385000
suvayu
  • 4,271
  • 2
  • 29
  • 35
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

Solved. Original solution here: Apply multiple functions to multiple groupby columns

def f(x):
        d = {}
        d['most_recent_sale'] = x["date"].nlargest(1).iloc[-1]
        d['second_most_recent_sale'] = x["date"].nlargest(2).iloc[-1]
        d['most_recent_price'] = x.loc[x["date"] == d["most_recent_sale"], "amt_Price"].values[0]
        d['second_most_recent_price'] = x.loc[x["date"] == d["second_most_recent_sale"], "amt_Price"].values[0]

        return pd.Series(d, index=['most_recent_sale', 'second_most_recent_sale', 'most_recent_price', 'second_most_recent_price'])

    df_grouped = df.groupby("id_Pid").apply(f)
David Seroy
  • 179
  • 1
  • 14