data <- data.frame(a=c("0125-510","0125-511","0125-512","4000-000","5000-000"), b=c("100 Fake Street", "100 Fake Street", "100 Fake Street", "200 Fake Street", "300 Fake Street"))
data <- data.frame(a=c("0125-510","0125-510","0125-510","4000-000","5000-000"), b=c("100 Fake Street", "100 Fake Street", "100 Fake Street", "200 Fake Street", "300 Fake Street"))
Input:
a b
0125-510 100 Fake Street
0125-511 100 Fake Street
0125-512 100 Fake Street
4000-000 200 Fake Street
5000-000 300 Fake Street
Output:
a b
0125-510 100 Fake Street
0125-510 100 Fake Street
0125-510 100 Fake Street
4000-000 200 Fake Street
5000-000 300 Fake Street
I have a dataframe with addresses. Each address has an associated consecutive ID that is formatted as a character string in order to preserve leading zeros. For each row that has a repeated address (Column B), I want to replace all values in Column A with the first ID that appears (i.e., the lowest ID).
The IDs may not necessarily be in order, so it could be formatted like:
Input:
a b
0125-512 100 Fake Street
0125-511 100 Fake Street
0125-510 100 Fake Street
4000-000 200 Fake Street
5000-000 300 Fake Street
Output:
a b
0125-510 100 Fake Street
0125-510 100 Fake Street
0125-510 100 Fake Street
4000-000 200 Fake Street
5000-000 300 Fake Street
I'm trying to implement this using dplyr.