I have a dataset I'm working with in R that has an ID number (ID), year they submitted data (Year), some other data (which isn't relevant to my question, but just consider them as "columns"), and a date of registration on our systems (DateR).
This dateR is autogenerated from the dataset I am using, and is supposed to represent the "earliest" date the ID number appears on our systems.
However, due to some kind problem with how the data is being pulled that I can't get fixed, the date is being recorded as a new date that updates every year, instead of simply the earliest date.
Thus, my goal would be to create a script that reworks the data and does the following two checks:
Firstly, it checks the row and identifies which rows have the matching ID number Secondly, it then applies the "earliest" date of all of the matching ID numbers in the date column.
So below is the example of a Dataset like what I am using
# | ID1 | YearSubmitted | Data | DateR |
---|---|---|---|---|
1 | 12345 | 2017 | 100 | 22-03-2017 |
2 | 12345 | 2018 | 100 | 22-03-2018 |
3 | 12345 | 2019 | 100 | 22-03-2019 |
4 | 22221 | 2018 | 100 | 22-03-2018 |
5 | 22221 | 2019 | 100 | 22-03-2019 |
This is what I would like it to look like (I have bolded the changed numbers for clarity)
# | ID1 | YearSubmitted | Data | DateR |
---|---|---|---|---|
1 | 12345 | 2017 | 100 | 22-03-2017 |
2 | 12345 | 2018 | 100 | 22-03-2017 |
3 | 12345 | 2019 | 100 | 22-03-2017 |
4 | 22221 | 2018 | 100 | 22-03-2018 |
5 | 22221 | 2019 | 100 | 22-03-2018 |
Most of the reference questions I have searched for this reference either replacing data with values fromanother column like If data present, replace with data from another column based on row ID, or use the replacement value as pulled from another dataframe like Replace a value in a dataframe by using other matching IDs of another dataframe in R.
I would prefer to acheive in dplyr if possible.
Preferably I'd like to start this with
data %>% group_by(ID1, Yearsubmitted) %>% mutate(across(c(DateR),
And I understand I could use the match function .. but I just draw a blank from this point on.
Thus, I would appreciate advice on how to:
Conditionally change the date if it's matching ID1 values, and secondly, to change all dates to the earliest value in the date column (DateR).
Thanks for your time.