0

I want to compare each row of df1 with a single row of df2 in tidy way. Any hint please.

df1 <-
  structure(
    list(
        Q1 = c("a", "a")
      , Q2 = c("b", "a")
      , Q3 = c("a", "a")
      , Q4 = c("b", "a")
      )
    , class = "data.frame"
    , row.names = c(NA, -2L)
    )

df2 <-
  structure(
    list(
        Q1 = "a"
      , Q2 = "a"
      , Q3 = "b"
      , Q4 = "c"
      )
    , class = "data.frame"
    , row.names = c(NA, -1L)
    )

library(tidyverse)


sum(df1[1, ] == df2)
[1] 1
sum(df1[2, ] == df2)
[1] 2
MYaseen208
  • 22,666
  • 37
  • 165
  • 309

5 Answers5

2

In Base

apply(df1,1, function(x) sum(x == df2))

[1] 1 2
Daniel O
  • 4,258
  • 6
  • 20
2

An option with base R is rowSums

rowSums(df1 == unlist(df2)[col(df1)])
#[1] 1 2

In tidyverse, we can also use c_across

library(dplyr)
df1 %>% 
    rowwise %>%
    mutate(new = sum(c_across(everything()) == df2)) 
# A tibble: 2 x 5
# Rowwise: 
#  Q1    Q2    Q3    Q4      new
#  <chr> <chr> <chr> <chr> <int>
#1 a     b     a     b         1
#2 a     a     a     a         2
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Either split it first and check identity:

library(purrr)
asplit(df1,1) %>% map_dbl(~sum(.==df2))

Or just map the row numbers:

1:nrow(df1) %>% map_dbl(function(i)sum(df1[i,]==df2))
[1] 1 2
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thanks for nice answer. Would appreciate if you guide how to mutate a column output into df1 – MYaseen208 Jun 18 '20 at 19:00
  • 1
    you mean to have the 1,2 as a column in df1? ``` df1 %>% mutate(y=1:nrow(.) %>% map_dbl(function(i)sum(df1[i,]==df2)))``` this will work. How big is your data.frame? Calling the row number is safer... – StupidWolf Jun 18 '20 at 19:02
2

A base R solution.

Compare and sum by rows:

rowSums(mapply(`==`, df1, df2))
#[1] 1 2

Edit.

Above is a new version of this post. The original summed by columns. Here is the code.

The return value is a list of logical vectors, then *apply function sum.

Map(`==`, df1, df2)
#$Q1
#[1] TRUE TRUE
#
#$Q2
#[1] FALSE  TRUE
#
#$Q3
#[1] FALSE FALSE
#
#$Q4
#[1] FALSE FALSE

res <- Map(`==`, df1, df2)
sapply(res, sum)
#Q1 Q2 Q3 Q4 
# 2  1  0  0

A one-liner would be

sapply(Map(`==`, df1, df2), sum)

Another one, faster.

colSums(mapply(`==`, df1, df2))
#Q1 Q2 Q3 Q4 
# 2  1  0  0
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Using purrr package:

unlist_df2 <- unlist(df2)
    seq_len(nrow(df1)) %>%
      map_lgl(~identical(unlist(df1[.x,]), unlist_df2))

For edit: change map_lgl to map_dbl and identical to sum & ==

unlist_df2 <- unlist(df2)
seq_len(nrow(df1)) %>%
  map_dbl(~sum(unlist(df1[.x,]) == unlist_df2))
det
  • 5,013
  • 1
  • 8
  • 16