-2

I Have a dataframe where I would like to:

  1. match the like values in a column
  2. ensure that the dates are greater than those of the previously matched.
  3. perform an operation such as subtract a value from another column.

For ex:

|       date | A | B |
|:---------- |:--|:--|
| 01-01-2022 |k1 | 1 |
| 01-02-2022 |c1 | 5 |
| 01-02-2022 |e1 | 4 |
| 01-05-2022 |k1 | 9 |
| 01-10-2022 |c1 | 9 |

In col A there are two k1's. Now what I would like to happen is identify the first k1 then the second, verify that the date in row 4 > row 0, then subtract the value in col B at row 0 from col B row 4 and so on until there aren't any k1's left. Then it moves on to the next value in col A etc. Then the results in a col C.

With the original data I was able to accomplish this via the groupby() function while using the shift() method which worked well. However, the one kink I had with that is I don't know how to merge that info back to the original df.

Val
  • 1
  • 2
  • 1
    [Please do not post images of code, data, error messages, etc.](https://stackoverflow.com/help/how-to-ask) Ideally provide a a self-contained example that reproduces your problem (see [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/14311263)). – Timus Mar 05 '22 at 23:26
  • Will avoid it in the future. Thx for the heads up. – Val Mar 06 '22 at 00:18
  • 1
    first avoid it in the present question. – furas Mar 06 '22 at 01:24
  • Geez...you guys are harsh. Done. Corrected. – Val Mar 06 '22 at 01:32
  • Correction done. – Val Mar 06 '22 at 01:35
  • Consider from 1st Jan to 10th Jan, each day there is one `k1`, and their values are 1,2,3,...,10. How many subtractions will be done, and what are the results? – Raymond Kwok Mar 06 '22 at 11:12
  • @RaymondKwok 9 subtractions will be done. Considering you start from the top of the list each k1 value would consider the next value for k1 as its max. If from the bottom each preceding value would be considered as its min. Ex: 2-1, 3-2, 4-3 etc. – Val Mar 06 '22 at 23:59

1 Answers1

0

We can use groupby and shift for your goal.

df = pd.DataFrame({'date': ['01-01-2022',
  '01-02-2022',
  '01-02-2022',
  '01-05-2022',
  '01-10-2022'],
 'A': ['k1', 'c1', 'e1', 'k1', 'c1'],
 'B': [1, 5, 4, 9, 9]})

Your "verify that the date in row 4 > row 0" is checked by:

mask = df.groupby('A')['date'].transform(lambda b: b > b.shift())

and your "subtract the value in col B at row 0 from col B row 4" is done separately by

subtracted = df.groupby('A')['B'].transform(lambda b: b - b.shift())

Finally we can combine the two to get the final result, picking a value from subtracted only when the corresponding mask is True.

df['masked_subtraction'] = np.where(mask, subtracted, np.nan)
print(df)

         date   A  B  masked_subtraction
0  01-01-2022  k1  1                 NaN
1  01-02-2022  c1  5                 NaN
2  01-02-2022  e1  4                 NaN
3  01-05-2022  k1  9                 8.0
4  01-10-2022  c1  9                 4.0
Raymond Kwok
  • 2,461
  • 2
  • 9
  • 11
  • Thank you very much. This worked. So if I wanted to add other columns similar to this, I would have to go through the same process? – Val Mar 07 '22 at 10:53
  • Put it this way, try first, and post a new question if not work, with example input and desired output? ;) – Raymond Kwok Mar 07 '22 at 10:58