Comparing a value by group based on date and creating new dataframe where value dropped in two columns

Question

I've a df as under

+-----+------+--------+--------------------+------+----------+---------+--------+-------+
| ID1 | ID2  | DOC_NO |        DATE        | COST | COST_NEW | CLIENT  | STATUS | LEVEL |
+-----+------+--------+--------------------+------+----------+---------+--------+-------+
| ABC | A123 |      1 | 2021-01-01 0:10:00 |   11 |       21 | ABC123  | A      |     1 |
| DEF | B456 |      2 | 2021-01-01 0:10:00 |   12 |       22 | DEF256  | B      |     2 |
| GHI | C789 |      3 | 2021-01-01 0:10:00 |   13 |       23 | GHI389  | C      |     3 |
| JKL | D890 |      4 | 2021-01-01 0:10:00 |   14 |       24 | JKL490  | D      |     4 |
| MNO | E012 |      5 | 2021-01-01 0:10:00 |   15 |       25 | MNO512  | E      |     5 |
| ABC | A123 |      6 | 2021-01-01 0:15:00 |   11 |       21 | ABC623  | A      |     1 |
| DEF | B456 |      7 | 2021-01-01 0:15:00 |   12 |       22 | DEF756  | B      |     3 |
| GHI | C789 |      8 | 2021-01-01 0:15:00 |   13 |       23 | GHI889  | C      |     2 |
| JKL | D890 |      9 | 2021-01-02 0:15:00 |   14 |       24 | JKL990  | D      |     4 |
| MNO | E012 |     10 | 2021-01-03 0:15:00 |   15 |       25 | MNO1012 | E      |     5 |
| ABC | A123 |     11 | 2021-01-03 0:20:00 |   10 |       20 | GHI890  | A      |     6 |
| DEF | B456 |     12 | 2021-01-03 0:20:00 |   11 |       21 | JKL991  | B      |     2 |
| GHI | C789 |     13 | 2021-01-03 0:20:00 |   12 |       22 | MNO1013 | C      |     6 |
| JKL | D890 |     14 | 2021-01-03 0:20:00 |   13 |       23 | GHI891  | D      |     7 |
| MNO | E012 |     15 | 2021-01-03 0:20:00 |   14 |       24 | JKL992  | E      |     4 |
| ABC | A123 |     16 | 2021-01-03 0:20:00 |   12 |       22 | MNO1014 | A      |     2 |
| DEF | B456 |     17 | 2021-01-03 0:20:00 |   13 |       23 | GHI892  | B      |     6 |
| GHI | C789 |     18 | 2021-01-03 0:20:00 |   14 |       24 | JKL993  | C      |     3 |
| JKL | D890 |     19 | 2021-01-03 0:20:00 |   15 |       25 | MNO1015 | D      |     7 |
| MNO | E012 |     20 | 2021-01-03 0:20:00 |   16 |       26 | GHI893  | E      |     2 |
| ABC | A123 |     21 | 2021-01-03 0:25:00 |   11 |       21 | ABC124  | A      |     1 |
| DEF | B456 |     22 | 2021-01-03 0:25:00 |   12 |       22 | DEF257  | B      |     6 |
| GHI | C789 |     23 | 2021-01-03 0:25:00 |   13 |       23 | GHI390  | C      |     5 |
| JKL | D890 |     24 | 2021-01-03 0:25:00 |   14 |       24 | JKL491  | D      |     7 |
| MNO | E012 |     25 | 2021-01-03 0:25:00 |   15 |       25 | MNO513  | E      |     1 |
+-----+------+--------+--------------------+------+----------+---------+--------+-------+

I want to group ID1 and ID2 and arrange the df by DOC_NO and DATE Post that want the script to sequentially look for instances where the COST and COST_NEW both dropped and set the COST_REF and COST_NEW_REF as the highest value of COST and COST_NEW in the preceding rows (max of both COST and COST_NEW in the precedeing row but both in the same row). I then want the script to pull the corresponding values of DATE, CLIENT, STATUS and LEVEL as DATE_LAST_REF,CLIENT,STATUS, LEVEL from the same where where we found the COST_REF and COST_NEW_REF

The resulting table should look like this:

+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+
| ID1 | ID2  | DOC_NO |        DATE        | COST | COST_REF | COST_NEW | COST_NEW_REF |   DATE_LAST_REF    | CLIENT  | STATUS | LEVEL |
+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+
| ABC | A123 |     11 | 2021-01-03 0:20:00 |   10 |       11 |       20 |           21 | 2021-01-01 0:15:00 | GHI890  | A      |     6 |
| DEF | B456 |     12 | 2021-01-03 0:20:00 |   11 |       12 |       21 |           22 | 2021-01-01 0:15:00 | JKL991  | B      |     2 |
| GHI | C789 |     13 | 2021-01-03 0:20:00 |   12 |       13 |       22 |           23 | 2021-01-01 0:15:00 | MNO1013 | C      |     6 |
| JKL | D890 |     14 | 2021-01-03 0:20:00 |   13 |       14 |       23 |           24 | 2021-01-02 0:15:00 | GHI891  | D      |     7 |
| MNO | E012 |     15 | 2021-01-03 0:20:00 |   14 |       15 |       24 |           25 | 2021-01-03 0:15:00 | JKL992  | E      |     4 |
+-----+------+--------+--------------------+------+----------+----------+--------------+--------------------+---------+--------+-------+

This is an extension to something I had been working on from:

Comparing a value by group based on date and creating new dataframe where value dropped

The best way to post your data is using `dput()`. To make a working example, we have to copy your table, remove the special characters and create a data.frame. Could you clearify how your output should be created? Perhaps by using multiple clear instrutions instead of two sentences spanning over 7 rows. ;-) — Martin Gal, Jul 30 '21 at 22:37
Please heed the advice in Ronak Shah's answer to your prior question, which shows how to share data in a way that makes it easier for people to help you. — Jon Spring, Jul 31 '21 at 03:15

Comparing a value by group based on date and creating new dataframe where value dropped in two columns

0 Answers0