0

Below is the sample data. Trying to add a new column of data d202002 to the "old" data frame but only doing so when they match (naics 123400,123500,123700) with 123600 having values in the d202001 column but not d202002. Kind of going in circles trying to merge these two.

 library(dplyr)
 library(data.table)

 naics <- c(123400,123500,123600,123700,123400,123500,123600,123700)
 area <- c(000001,000001,000001,000001,000003,000003,000003,000003)
 areatype <- c(04,04,04,04,04,04,04,04)
 state <- c(32,32,32,32,32,32,32,32)
 d202001 <- c(16,18,20,22,24,26,28,30)

 old <- data.frame(naics,area,areatype,state,d202001)

 naicsnew <- c(123400,123500,123700,123800,123900,123400,123500,123700,123800,123900)
 areanew <- c(000001,000001,000001,000001,000001,000003,000003,000003,000003,000003)
 areatypenew <- c(04,04,04,04,04,04,04,04,04,04)
 statenew <- c(32,32,32,32,32,32,32,32,32,32)
 d202002 <- c(17,19,21,23,41,35,36,37,38,45)

 new <- data.frame(naicsnew,areanew,areatypenew,statenew,d202002)

First attempt

 new2 <- left_join(old,new, by = c("naics" = "naicsnew", "area" = "areanew", "areatype"="areatypenew","state"="statenew"))

Below is the intended result

 naics       area     areatype     state     d202001      d202002
 123400         1        4            32         16          17
 123500         1        4            32         18          19
 123700         1        4            32         22          21
 123400         3        4            32         24          35
 123500         3        4            32         26          36
 123700         3        4            32         30          38
Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43
  • 1
    If you only want rows for matches in *both* data frames, that's an inner join, not a left join – camille Dec 31 '21 at 00:31
  • I am editing the question now. I have a much larger data set and I am using this exact process and it is acting like a cbind in the larger case. Trying to make this smaller manual set act like that. – Tim Wilcox Dec 31 '21 at 00:57
  • @ekoam, it would if I was using SQL. I May try that on monday. Been using R and not getting anywhere. The above example works but the larger data set that I am using is not cooperating. They are structured exactly the same and yet no success. – Tim Wilcox Dec 31 '21 at 02:09
  • That post *is* about R. It translates the SQL terminology of joins into R, which the dplyr join functions do. The [dplyr vignette](https://dplyr.tidyverse.org/articles/two-table.html) on two-table functions goes through this – camille Dec 31 '21 at 03:30

1 Answers1

0

Your intended result keeps only rows included both in old and new. For that you need inner_join(). See: https://dplyr.tidyverse.org/reference/mutate-joins.html

inner_join(): includes all rows in x and y.

new3 <- inner_join(old, new, by = c("naics" = "naicsnew", "area" = "areanew", "areatype" = "areatypenew", "state" = "statenew"))
new3
   naics area areatype state d202001 d202002
1 123400    1        4    32      16      17
2 123500    1        4    32      18      19
3 123700    1        4    32      22      21
4 123400    3        4    32      24      35
5 123500    3        4    32      26      36
6 123700    3        4    32      30      37
GaelS
  • 630
  • 6
  • 15