I am trying to left_join
two datasets and minimize duplicates from the join. The purpose of joining the data is to match information from df_2
that relates to coordinates of a postcode for each a buyer and a seller in df_1
.
At the moment I am performing the join one after the other for the buyer and seller, but it just leads to duplicates. I have looked into perhaps a conditional join where postcode and suburb are the same in each row of df_1
, but not quite sure if this is the right approach.
An example dataset is below:
df_1 <- data.frame(buyer = c('Sarah', 'John', 'Sam', 'Sally'), Seller = c('B corp', 'M Ltd', 'S and co', 'M Ltd'), purchase = c('hat', 'shirt', 'ball', 'shoe'), buyer_suburb = c('Sandy', 'Rocky', 'Leafy', 'Sandy'), buyer_postcode = c('001', '002', '003', '008'), seller_suburb = c('Sandy', 'Leafy', 'Ocean', 'Leafy'), seller_postcode = c('001', '003', '004', '009'))
df_2 <- data.frame(suburbs = c('Sandy', 'Rocky', 'Leafy', 'Ocean', 'Sandy', 'Leafy'), postcode = c('001', '002', '003', '004', '008', '009'), coordinates = c('0.01, 2.00', '0.02, 3.00', '0.03, 4.00', '0.02, 5.00', '0.02, 8.00', '0.02, 9.00'))
join_df_1 <- left_join(df_1, df_2, by=c("buyer_suburb" = "suburbs"))
join_df <- left_join(join_df_1, df_2, by=c("seller_suburb" = "suburbs"))