2

I have a dataset "df" with many observations and multiple variables including some postal codes (repeated several times in some cases) and a different dataset "df2" with the coordinates of these postal codes. I want to add two new variables to my first dataset "df" with the coordinates of these postal codes but, given that huge amount of data I have, it takes too long with a loop. I would like to know if I can vectorize it in some way while maintaining the dataframe structure and not changing into matrix. I attach a simplified version of what I want to achieve.

# This dataset has my variables (removed the rest for simplicity)
df <- data.frame(pc = c("00001", "00002", "00003", 
                        "00001", "00002", "00003", 
                        "00001", "00002", "00003"))
     pc
1 00001
2 00002
3 00003
4 00001
5 00002
6 00003
7 00001
8 00002
9 00003

#This dataset holds the coordinates
df2 <- data.frame(pc = c("00001", "00002", "00003"),
                 lat = c(1, 2, 3),
                 long = c(4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6

#This is the dataset I need
good.df <- data.frame(pc = c("00001", "00002", "00003", 
                             "00001", "00002", "00003", 
                             "00001", "00002", "00003"),
                      lat = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
                      long = c(4, 5, 6, 4, 5, 6, 4, 5, 6))
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6 

I have searched for the solution for quite a long time, but considering I do not know how to properly ask the question I have had no success so far. I would really appreciate some guidance here.

Thank you

  • 1
    What you are describing is called a left join. You can find many answers on how to do this by searching here for "R left join" – Allan Cameron Apr 22 '22 at 18:30

1 Answers1

2

We could use left_join from dplyr package. Joining by pc:

library(dplyr)
left_join(df, df2, by = "pc")
     pc lat long
1 00001   1    4
2 00002   2    5
3 00003   3    6
4 00001   1    4
5 00002   2    5
6 00003   3    6
7 00001   1    4
8 00002   2    5
9 00003   3    6
TarJae
  • 72,363
  • 6
  • 19
  • 66