How would I implement this countif in r?

Question

My two dataframes:

I would like to add a 2nd column, Col2, to df1 where each value in the column is 1 if it's respective value in Col1 is also in Col1 of df2. Thus df1 would look like this:

 df1
    Col1 Col2
     A    1
     B    0
     C    0

Thanks!

score 2 · Accepted Answer · answered Jul 31 '17 at 19:01

2

Add the col2 to df2

df2$Col2 <- 1

Perform a left-join merge:

df3 <- merge(df1, df2, all.x=T, by='Col1')

Replace the NAs with zeros

df3$Col2[which(is.na(df3$Col2))] <- 0

df3 is now

  Col1 Col2
1    A    1
2    B    0
3    C    0

Edit: @ycw has done it more concisely using as.numeric and %in%. I like his answer, but I thought I'd edit mine to include a version of his work that doesn't use dplyr:

It's as simple as df1$Col2 <- as.numeric(df1$Col1 %in% df2$Col1). Much better than mine!

answered Jul 31 '17 at 19:01

HarlandMason

779
5
17

Nice solution. And thanks for adding the base R solution of my approach. – www Jul 31 '17 at 19:06
2

Nice. Another verson: `df1$found = !is.na(merge(df1, transform(df2, z = 1), all.x = TRUE)$z)`. Note that `%in%` is restrictive compared to this answer (which allows for merging on more than just one column). – Frank Jul 31 '17 at 19:54

www · Answer 2 · 2017-07-31T19:08:35.217

1

df3 is the final output.

library(dplyr)

df1 <- data_frame(Col1 = c("A", "B", "C"))
df2 <- data_frame(Col1 = c("A", "D", "E", "F"))

df3 <- df1 %>% mutate(Col2 = as.numeric(Col1 %in% df2$Col1))

Or the following approach is similar to HarlandMason's method but using dplyr and tidyr.

library(dplyr)
library(tidyr)
df3 <- df2 %>%
  mutate(Col2 = 1) %>%
  right_join(df1, by = "Col1") %>%
  replace_na(list(Col2 = 0))

edited Jul 31 '17 at 19:08

answered Jul 31 '17 at 19:00

www

38,575
12
48
84

2

Alternative without dplyr: `df1$Col2 = as.numeric(df1$Col1 %in% df2$Col1)` – Florian Jul 31 '17 at 19:05
For some reason I'm getting: Error in library(dplyr) : there is no package called ‘dplyr’ Any idea what is wrong? – RaiderNAYSHUN Jul 31 '17 at 20:08
1

Have you installed dplyr as with `install.packages('dplyr')` – HarlandMason Jul 31 '17 at 20:11
Nevermind, I used @Florian's answer and used cbind() to stick that back onto df1. Thanks @Florian! – RaiderNAYSHUN Jul 31 '17 at 20:18
@RaiderNAYSHUN Perhaps you can mark HarlandMason's answer to let others know this question has been answered. – www Aug 01 '17 at 05:59

cderv · Answer 3 · 2017-08-01T06:06:34.620

Two options using data.table

First one uses %chin% operator :

library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))

x[, found:= v %chin% y$v]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

The second one is built on merging behaviour:

library(data.table)
x = data.table(v = LETTERS[1:3])
y = data.table(v = c("A","D","E","F"))

y[, found := TRUE]
x[, found:= y[.SD, .(ifelse(is.na(found), FALSE, TRUE)), on  = .(v)]]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

EDIT: Based on @frank comment, you could simplify with no ifelse - it is the same

x[, found:= y[.SD, !is.na(found), on  = .(v)]]
x
#>    v found
#> 1: A  TRUE
#> 2: B FALSE
#> 3: C FALSE

For understanding what happens, here is the inside code I built on:

x[, found := NULL]
y[x, on = .(v)]
#>    v found
#> 1: A  TRUE
#> 2: B    NA
#> 3: C    NA

That's a nice idea for doing it. Fyi, `ifelse` is usually discouraged because of this https://stackoverflow.com/q/16275149/ In this case, I guess you could just use `!is.na(found)` instead. — Frank, Jul 31 '17 at 20:57
Yes you're right. I don't know I missed it. I update the answer. — cderv, Aug 01 '17 at 06:03

How would I implement this countif in r?

3 Answers3