0

I have a data frame of questionnaire data in a wide format. For some questions, respondents are asked to answer whether a given scenario applies to them (Yes, No). If the scenario applies to them, respondents are asked to provide examples of the scenarios (in some cases, there might be more than one example).

My dataset looks something like the following, with Q1 referring to a scenario, while Q1a, Q1b, and Q1c are open-ended fields for them to provide examples of said scenario.

df <- data.frame(Q1 = c("Yes", "No", "Yes", "No", "Yes", "Yes", "Yes"), 
             Q1a = c("AAA", NA, "AAA", NA, "ABC", "DDD", "EEE"),
             Q1b = c("BBB", NA, NA, NA, "BCD", NA, "AAA"),
             Q1c = c(NA, NA, NA, NA, "EFG", NA, "AAA"))

I want to create a new column such that, if a scenario applies to the respondent, it will count, for a given row, and across the 3 columns (Q1a, Q1b, Q1c), the number of examples the respondents provided.

What I have come up so far is to hardcode every permutation of non-NA values:

df$count_non_NA <- ifelse(df$Q1 == "No", 0,
                      ifelse(df$Q1 == "Yes" & !is.na(df$Q1a) & !is.na(df$Q1b) & !is.na(df$Q1c), 3,
                             ifelse(df$Q1 == "Yes" & ((!is.na(df$Q1a) & !is.na(df$Q1b)) | 
                                      (!is.na(df$Q1b) & !is.na(df$Q1c))|
                                      (!is.na(df$Q1a) & !is.na(df$Q1c))), 2, 1)))

It works, but I am going to repeat the same thing for every single scenario. Furthermore, I am curious what will happen if respondents are provided with more than 3 examples to fill in. It would be a pain to hardcode the permutations. Hence, I am seeking a more efficient solution to my problem.

Count non-NA values across columns

The screenshot above shows the result of my code. To illustrate, row 1 shows that the scenario applies to them, and two examples are provided ("AAA" in Q1a column; "BBB" in Q1b column). As Q1a and Q1b are non-NA columns, the count_non_NA column reflects a 2.

DTYK
  • 1,098
  • 1
  • 8
  • 33

2 Answers2

3
df$column_non_NA= rowSums(!is.na(df[-1]))
df
   Q1  Q1a  Q1b  Q1c column_non_NA
1 Yes  AAA  BBB <NA>             2
2  No <NA> <NA> <NA>             0
3 Yes  AAA <NA> <NA>             1
4  No <NA> <NA> <NA>             0
5 Yes  ABC  BCD  EFG             3
6 Yes  DDD <NA> <NA>             1
7 Yes  EEE  AAA  AAA             3
Onyambu
  • 67,392
  • 3
  • 24
  • 53
2

Here's a way with pmap to achieve the effect of applying a function across rows:

library(tidyverse)
df <- data.frame(
  Q1 = c("Yes", "No", "Yes", "No", "Yes", "Yes", "Yes"),
  Q1a = c("AAA", NA, "AAA", NA, "ABC", "DDD", "EEE"),
  Q1b = c("BBB", NA, NA, NA, "BCD", NA, "AAA"),
  Q1c = c(NA, NA, NA, NA, "EFG", NA, "AAA")
)

df %>%
  mutate(
    count_non_na = pmap(
      .l = list(Q1a, Q1b, Q1c),
      .f = function(...) sum(!rlang::are_na(c(...)))
    )
  )
#>    Q1  Q1a  Q1b  Q1c count_non_na
#> 1 Yes  AAA  BBB <NA>            2
#> 2  No <NA> <NA> <NA>            0
#> 3 Yes  AAA <NA> <NA>            1
#> 4  No <NA> <NA> <NA>            0
#> 5 Yes  ABC  BCD  EFG            3
#> 6 Yes  DDD <NA> <NA>            1
#> 7 Yes  EEE  AAA  AAA            3

Created on 2018-07-17 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42