1

This is an example of my data

ID purchaseDate numOfItemsPurchased
12 12-10-2023 2
12 12-01-2023 34
56 24-03-2020 12
23 23-12-2012 1
23 23-05-2012 3
23 23-06-2012 4
24 12-10-2023 24
38 23-02-2012 21
16 12-10-2023 34
54 02-09-2020
54 24-03-2020 19

I want to extract the highest and lowest purchaseNo for each 'ID' and the dates they were made. the expected result should look like:

ID maxPurchaseDate maxPurchaseAmount leastPurchaseDate leastPurchaseAmount
12 12-01-2023 34 12-10-2023 2
56 24-03-2020 12 24-03-2020 12
23 23-06-2012 4 23-12-2012 1
24 12-10-2023 24 12-10-2023 24
38 23-02-2012 21 23-02-2012 21
16 12-10-2023 34 12-10-2023 34
54 24-03-2020 19 02-09-2020 11

Please how do I get this done?

I tried this code:

details = data.groupby('ID').min()['purchaseDate']

But I get a KeyError: 'purchaseDate'. I have double checked to be sure the column was not misspelled. and I have made sure to fillna(0)

I also tried doing it this way:

details = data.groupby('ID').min()['numOfItemsPurchased']

It doesn't throw an error but I don't know how to add more columns to it

I have tried this code:

data.groupby(['ID']).agg({'purchaseDate': [np.min,np.max],
                          'numOfItemsPurchased' : [np.min,np.max]})

then I get a TypeError: '<=' not supported between instances of 'int' and 'str'

  • Make sure to `df["purchaseDate"] = pd.to_datetime(df["purchaseDate"], dayfirst=True)` then, you can use `out = df.groupby("ID", sort=False).agg(["min", "max"])`, and last but not least, `out.columns = out.swaplevel(axis=1).columns.map("".join)` to flatten the column names (*with optional* `out = out.reset_index()`) – Timeless Jun 21 '23 at 20:16

1 Answers1

0

Try:

def fn(x):
    mx, mn = x["numOfItemsPurchased"].idxmax(), x["numOfItemsPurchased"].idxmin()
    return pd.Series(
        {
            "maxPurchaseDate": x.at[mx, "purchaseDate"],
            "maxPurchaseAmount": x.at[mx, "numOfItemsPurchased"],
            "leastPurchaseDate": x.at[mn, "purchaseDate"],
            "leastPurchaseAmount": x.at[mn, "numOfItemsPurchased"],
        }
    )

# convert to datetime if necessary:
df["purchaseDate"] = pd.to_datetime(df["purchaseDate"], dayfirst=True)

out = df.groupby("ID", sort=False).apply(fn).reset_index()
print(out)

Prints:

   ID maxPurchaseDate  maxPurchaseAmount leastPurchaseDate  leastPurchaseAmount
0  12      2023-01-12                 34        2023-10-12                    2
1  56      2020-03-24                 12        2020-03-24                   12
2  23      2012-06-23                  4        2012-12-23                    1
3  24      2023-10-12                 24        2023-10-12                   24
4  38      2012-02-23                 21        2012-02-23                   21
5  16      2023-10-12                 34        2023-10-12                   34
6  54      2020-03-24                 19        2020-09-02                   11
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91