0

I have a pandas dataframe with 5 columns. In the 3th and 4th column are coordinates that are often the same. I count the same ones and generate a new column with the quantity. The counter starts with 0. That is accomplished with:

df['Quantity'] = df.groupby(['abbreviation', 'X', 'Y']).cumcount()
    name abbreviation     X     Y  Quantity
0    A-x            A  15.6  19.4         0
1   A-y2            A  15.6  19.4         1
2    B-a            B  15.0  25.0         0
3    B-d            B  15.0  25.0         1
4   C-x1            C  15.0  10.0         0
5   C-c4            C  15.0  10.0         1
6    C-5            C  15.0  10.0         2
7    E-v            E  83.4  16.5         0
8    E-2            E  83.4  16.5         1
9   E-v2            E  83.4  16.5         2
10   E-1            E  83.4  16.5         3
11  F-ab            F  19.1  98.4         0
12  F-nb            F  19.1  98.4         1
13  G-ku            G  78.0  17.0         0

Now a new column 'Max Quantity' is desired, that contains the highest value of 'Quantity' until it is 0 again.

And I would like to receive a new column that looks like this:

    name abbreviation     X     Y  Quantity Max Quantity
0    A-x            A  15.6  19.4         0            1
1   A-y2            A  15.6  19.4         1            1
2    B-a            B  15.0  25.0         0            1
3    B-d            B  15.0  25.0         1            1
4   C-x1            C  15.0  10.0         0            2
5   C-c4            C  15.0  10.0         1            2
6    C-5            C  15.0  10.0         2            2
7    E-v            E  83.4  16.5         0            3
8    E-2            E  83.4  16.5         1            3
9   E-v2            E  83.4  16.5         2            3
10   E-1            E  83.4  16.5         3            3 
11  F-ab            F  19.1  98.4         0            1
12  F-nb            F  19.1  98.4         1            1
13  G-ku            G  78.0  17.0         0            0
  • 1
    `df['Quantity'] = df.groupby(['abbreviation', 'X', 'Y']).cumcount()` – mozway Oct 13 '22 at 10:54
  • That was really easy and good thanks! Is there any way to add another column like 'Max Quantity' that is the highest value of 'Quantity' until the next 0 appears? I also edited the question so that it is more clear. – maxwell_742 Oct 13 '22 at 12:04
  • 1
    `df.groupby(['abbreviation', 'X', 'Y'])['X'].transform('size').sub(1)` – mozway Oct 13 '22 at 12:12
  • `df['Max Quantity'] = df.groupby(['abbreviation', 'X', 'Y'])['X'].transform('Quantity')` – jezrael Oct 13 '22 at 12:24
  • First of thanks and I'm impressed, the next coffee is on me. Just for understanding, what does the `[’X’]` do? I thought it would count the same X values which would lead to a false result after row 3. But it works perfectly. – maxwell_742 Oct 13 '22 at 12:27
  • @maxwell_742 `groupby.transform('size')` requires a series. So you need to use `transform` on only one of the columns, whichever one. I picked a short name ;) – mozway Oct 13 '22 at 14:32

0 Answers0