get unique id, get latest date, get highest number and lowest number python pandas dataframe

Question

This is an example of my data

ID	purchaseDate	numOfItemsPurchased
12	12-10-2023	2
12	12-01-2023	34
56	24-03-2020	12
23	23-12-2012	1
23	23-05-2012	3
23	23-06-2012	4
24	12-10-2023	24
38	23-02-2012	21
16	12-10-2023	34
54	02-09-2020
54	24-03-2020	19

I want to extract the highest and lowest purchaseNo for each 'ID' and the dates they were made. the expected result should look like:

ID	maxPurchaseDate	maxPurchaseAmount	leastPurchaseDate	leastPurchaseAmount
12	12-01-2023	34	12-10-2023	2
56	24-03-2020	12	24-03-2020	12
23	23-06-2012	4	23-12-2012	1
24	12-10-2023	24	12-10-2023	24
38	23-02-2012	21	23-02-2012	21
16	12-10-2023	34	12-10-2023	34
54	24-03-2020	19	02-09-2020	11

Please how do I get this done?

I tried this code:

details = data.groupby('ID').min()['purchaseDate']

But I get a KeyError: 'purchaseDate'. I have double checked to be sure the column was not misspelled. and I have made sure to fillna(0)

I also tried doing it this way:

details = data.groupby('ID').min()['numOfItemsPurchased']

It doesn't throw an error but I don't know how to add more columns to it

I have tried this code:

data.groupby(['ID']).agg({'purchaseDate': [np.min,np.max],
                          'numOfItemsPurchased' : [np.min,np.max]})

then I get a TypeError: '<=' not supported between instances of 'int' and 'str'

Make sure to `df["purchaseDate"] = pd.to_datetime(df["purchaseDate"], dayfirst=True)` then, you can use `out = df.groupby("ID", sort=False).agg(["min", "max"])`, and last but not least, `out.columns = out.swaplevel(axis=1).columns.map("".join)` to flatten the column names (*with optional* `out = out.reset_index()`) — Timeless, Jun 21 '23 at 20:16

score 0 · Answer 1 · answered Jun 21 '23 at 20:22

Try:

def fn(x):
    mx, mn = x["numOfItemsPurchased"].idxmax(), x["numOfItemsPurchased"].idxmin()
    return pd.Series(
        {
            "maxPurchaseDate": x.at[mx, "purchaseDate"],
            "maxPurchaseAmount": x.at[mx, "numOfItemsPurchased"],
            "leastPurchaseDate": x.at[mn, "purchaseDate"],
            "leastPurchaseAmount": x.at[mn, "numOfItemsPurchased"],
        }
    )

# convert to datetime if necessary:
df["purchaseDate"] = pd.to_datetime(df["purchaseDate"], dayfirst=True)

out = df.groupby("ID", sort=False).apply(fn).reset_index()
print(out)

Prints:

   ID maxPurchaseDate  maxPurchaseAmount leastPurchaseDate  leastPurchaseAmount
0  12      2023-01-12                 34        2023-10-12                    2
1  56      2020-03-24                 12        2020-03-24                   12
2  23      2012-06-23                  4        2012-12-23                    1
3  24      2023-10-12                 24        2023-10-12                   24
4  38      2012-02-23                 21        2012-02-23                   21
5  16      2023-10-12                 34        2023-10-12                   34
6  54      2020-03-24                 19        2020-09-02                   11

get unique id, get latest date, get highest number and lowest number python pandas dataframe

1 Answers1