1

My two dataframes:

 df1
    Col1 
     A
     B
     C

 df2
    Col1 
     A  
     D  
     E  
     F  

I would like to add a 2nd column, Col2, to df1 where each value in the column is 1 if it's respective value in Col1 is also in Col1 of df2. Thus df1 would look like this:

 df1
    Col1 Col2
     A    1
     B    0
     C    0

Thanks!

3 Answers3

2

Add the col2 to df2

df2$Col2 <- 1

Perform a left-join merge:

df3 <- merge(df1, df2, all.x=T, by='Col1')

Replace the NAs with zeros

df3$Col2[which(is.na(df3$Col2))] <- 0

df3 is now

  Col1 Col2
1    A    1
2    B    0
3    C    0

Edit: @ycw has done it more concisely using as.numeric and %in%. I like his answer, but I thought I'd edit mine to include a version of his work that doesn't use dplyr:

It's as simple as df1$Col2 <- as.numeric(df1$Col1 %in% df2$Col1). Much better than mine!

HarlandMason
  • 779
  • 5
  • 17
  • Nice solution. And thanks for adding the base R solution of my approach. – www Jul 31 '17 at 19:06
  • 2
    Nice. Another verson: `df1$found = !is.na(merge(df1, transform(df2, z = 1), all.x = TRUE)$z)`. Note that `%in%` is restrictive compared to this answer (which allows for merging on more than just one column). – Frank Jul 31 '17 at 19:54
1

df3 is the final output.

library(dplyr)

df1 <- data_frame(Col1 = c("A", "B", "C"))
df2 <- data_frame(Col1 = c("A", "D", "E", "F"))

df3 <- df1 %>% mutate(Col2 = as.numeric(Col1 %in% df2$Col1)) 

Or the following approach is similar to HarlandMason's method but using dplyr and tidyr.

library(dplyr)
library(tidyr)
df3 <- df2 %>%
  mutate(Col2 = 1) %>%
  right_join(df1, by = "Col1") %>%
  replace_na(list(Col2 = 0))
www
  • 38,575
  • 12
  • 48
  • 84
1

Two options using data.table

First one uses %chin% operator :

library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))

x[, found:= v %chin% y$v]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

The second one is built on merging behaviour:

library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))

y[, found := TRUE]
x[, found:= y[.SD, .(ifelse(is.na(found), FALSE, TRUE)), on  = .(v)]]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

EDIT: Based on @frank comment, you could simplify with no ifelse - it is the same

x[, found:= y[.SD, !is.na(found), on  = .(v)]]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

For understanding what happens, here is the inside code I built on:

x[, found := NULL]
y[x, on = .(v)]
#>    v found
#> 1: A  TRUE
#> 2: B    NA
#> 3: C    NA
cderv
  • 6,272
  • 1
  • 21
  • 31
  • That's a nice idea for doing it. Fyi, `ifelse` is usually discouraged because of this https://stackoverflow.com/q/16275149/ In this case, I guess you could just use `!is.na(found)` instead. – Frank Jul 31 '17 at 20:57
  • 1
    Yes you're right. I don't know I missed it. I update the answer. – cderv Aug 01 '17 at 06:03