Count pairs of non-NA observations by row in selected columns

Question

I have a dataframe:

  id    cog com emo
AUD-002 12  34  24
PAR-044 NA  28  38
BRE-019 0   NA  51
2-1-GRE NA  31  68

I am interested in counting non-NA values per row between all pairs of columns cog, com, emo

My required output is:

  id     cog com  emo cog-com cog-emo com-emo
AUD-002  12  34   24   1        1       1
PAR-044  NA  28   38   0        0       1
BRE-019  0   NA   51   0        1       0
2-1-GRE  NA  31   68   0        0       1

I found that the following question might be related: Count non-NA observations by row in selected columns but they count overall non-NA entries per row and not by pairs of columns of that row. Also, I can achieve this by using multiple statements like this:

library(dplyr)
df = df %>%
  mutate(count_cog_com = rowSums(!is.na(select(., 2:3))) - 1)

df = df %>%
  mutate(count_cog_emo = rowSums(!is.na(select(., 2,4))) - 1)

df = df %>%
  mutate(count_com_emo = rowSums(!is.na(select(., 3:4))) - 1)

But I don't want to use these on my actual data because I have several columns. Is there an easy dplyr way to achieve this functionality? Can these statements be joined somehow? Thank you fo your help!

The dput is as below:

dput(df)

structure(list(id = structure(c(2L, 4L, 3L, 1L), 
               .Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), 
               class = "factor"), 
               cog = c(12L, NA, 0L, NA), 
               com = c(34L, 28L, NA, 31L), 
               emo = c(24L, 38L, 51L, 68L)), 
           row.names = c(NA, -4L), class = "data.frame")

Rui Barradas · Accepted Answer · 2022-10-15T08:50:54.003

Here is a base R way.
Function combn returns the combinations of its 1st argument and optionally applies a function to them. In this case it computes the row sums minus 1. The column names are then assigned in a similar way.

df <-
  structure(list(
    id = structure(c(2L, 4L, 3L, 1L), 
                   .Label = c("2-1-GRE", "AUD-002", "BRE-019", "PAR-044"), 
                   class = "factor"), 
    cog = c(12L, NA, 0L, NA), 
    com = c(34L, 28L, NA, 31L), 
    emo = c(24L, 38L, 51L, 68L)), 
    row.names = c(NA, -4L), class = "data.frame")

tmp <- combn(df[-1], 2, \(x) rowSums(!is.na(x)) - 1L)
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)

df
#>        id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002  12  34  24       1       1       1
#> 2 PAR-044  NA  28  38       0       0       1
#> 3 BRE-019   0  NA  51       0       1       0
#> 4 2-1-GRE  NA  31  68       0       0       1

^{Created on 2022-10-15 with reprex v2.0.2}

Edit

Answering to the request in comment, yes, it is possible. Have the anonymous function called by combn compute the logical && and coerce the result to integer. This will return a 0 if any of the values is NA and 1 otherwise.

The line that needs to be changed is this:

tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))

A complete code run:

tmp <- combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))
colnames(tmp) <- combn(names(df)[-1], 2, paste, collapse = "_")
df <- cbind(df, tmp)
rm(tmp)

df
#>        id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002  12  34  24       1       1       1
#> 2 PAR-044  NA  NA  38       0       0       0
#> 3 BRE-019   0  NA  51       0       1       0
#> 4 2-1-GRE  NA  31  68       0       0       1

^{Created on 2022-10-15 with reprex v2.0.2}

More readable but equivalent:

tmp <- combn(df[-1], 2, \(x) {
  not_na <- apply(!is.na(x), 1, \(y) y[1] && y[2])
  as.integer(not_na)
})

Thanks, I have a quick question. Can this code be updated so that when one or both entries are NA, the pair count is set to zero? Currently, while a single NA makes the pair count zero, the presence of two NAs makes the count -1. — Sandy, Oct 15 '22 at 06:34
@Sandy Yes, updated. To subtract `1` feels like a hack, I find the logical conjunction better. — Rui Barradas, Oct 15 '22 at 07:59
I guess you can change this ```combn(df[-1], 2, \(x) +apply(!is.na(x), 1, \(y) y[1] && y[2]))``` to ```combn(df[-1], 2, \(x) +!is.na(x[, 2] * x[, 1]))```. — M--, Nov 10 '22 at 17:05

M-- · Answer 2 · 2022-11-07T20:45:49.467

Here's a tidy solution;

library(dplyr)
library(purrr)


subset(expand.grid(x = 2:ncol(df), y = 2:ncol(df)), x < y) -> col.combn
  
map2_dfc(names(df)[col.combn[,1]], names(df)[col.combn[,2]], 
         ~transmute(df, !!paste0(.x, "_", .y) := +!is.na(!!sym(.x) * !!sym(.y)))) %>% 
bind_cols(df, .)

#>        id cog com emo cog_com cog_emo com_emo
#> 1 AUD-002  12  34  24       1       1       1
#> 2 PAR-044  NA  28  38       0       0       1
#> 3 BRE-019   0  NA  51       0       1       0
#> 4 2-1-GRE  NA  31  68       0       0       1

Thanks to @akrun, here's an updated version using combn instead of expand.grid:

combn(names(df)[-1], 2, FUN = function(nm) 
       df %>% 
         transmute(!! str_c(nm, collapse = "_") := 
                    +(if_all(all_of(nm), complete.cases))), simplify = FALSE) %>% 
         bind_cols(df, .)

or using combn with map2_dfc:

combn(names(df)[-1], 2) %>% 
map2_dfc(.x = .[1,], .y = .[2,], 
         .f = ~transmute(df, !!paste0(.x, "_", .y) := +!is.na(!!sym(.x) * !!sym(.y)))) %>% 
bind_cols(df, .)

Do you want `combn(names(df)[-1], 2, FUN = \(nm) df %>% transmute(!! str_c(nm, collapse = "_") := +(if_all(all_of(nm), complete.cases))), simplify = FALSE) %>% bind_cols(df, .)` — akrun, Nov 04 '22 at 17:11

Count pairs of non-NA observations by row in selected columns

2 Answers2

Edit

Linked