1

I have a data frame where each condition (in the example: hope, dream, joy) has 5 variables (in the example, coded with suffixes x, y, z, a, b - the are the same for each condition).

df <- data.frame(matrix(1:16,5,16))
names(df) <- c('ID','hopex','hopey','hopez','hopea','hopeb','dreamx','dreamy','dreamz','dreama','dreamb','joyx','joyy','joyz','joya','joyb')
df[1,2:6] <- NA
df[3:5,c(7,10,14)] <- NA

This is how the data looks like:

ID hopex hopey hopez hopea hopeb dreamx dreamy dreamz dreama dreamb joyx joyy joyz joya joyb
1  1    NA    NA    NA    NA    NA     15      4      9     14      3    8   13    2    7   12
2  2     7    12     1     6    11     16      5     10     15      4    9   14    3    8   13
3  3     8    13     2     7    12     NA      6     11     NA      5   10   15   NA    9   14
4  4     9    14     3     8    13     NA      7     12     NA      6   11   16   NA   10   15
5  5    10    15     4     9    14     NA      8     13     NA      7   12    1   NA   11   16

I want to create a new variable for each condition (hope, dream, joy) that codes whether all of the variables x...b for that condition are NA (0 if all are NA, 1 if any is non-NA). And I want the new variables to be stored in the data frame. Thus, the output should be this:

  ID hopex hopey hopez hopea hopeb dreamx dreamy dreamz dreama dreamb joyx joyy joyz joya joyb hope joy dream
1  1    NA    NA    NA    NA    NA     15      4      9     14      3    8   13    2    7   12    0   1     1
2  2     7    12     1     6    11     16      5     10     15      4    9   14    3    8   13    1   1     1
3  3     8    13     2     7    12     NA      6     11     NA      5   10   15   NA    9   14    1   1     1
4  4     9    14     3     8    13     NA      7     12     NA      6   11   16   NA   10   15    1   1     1
5  5    10    15     4     9    14     NA      8     13     NA      7   12    1   NA   11   16    1   1     1

The code below does it, but I'm looking for a more elegant solution (e.g., for a case where I have even more conditions). I've tried with various combinations of all(), select(), mutate(), but while they all seem useful, I cannot figure out how to combine them to get what I want. I'm stuck and would be interested in learning to code more efficiently. Thanks in advance!

df$hope <- 0
df[is.na(df$hopex) == FALSE | is.na(df$hopey) == FALSE | is.na(df$hopez) == FALSE | is.na(df$hopea) == FALSE | is.na(df$hopeb) == FALSE, "hope"] <- 1

df$dream <- 0
df[is.na(df$dreamx) == FALSE | is.na(df$dreamy) == FALSE | is.na(df$dreamz) == FALSE | is.na(df$dreama) == FALSE | is.na(df$dreamb) == FALSE, "dream"] <- 1

df$joy<- 0
df[is.na(df$joyx) == FALSE | is.na(df$joyy) == FALSE | is.na(df$joyz) == FALSE | is.na(df$joya) == FALSE | is.na(df$joyb) == FALSE, "joy"] <- 1
hps
  • 13
  • 3
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 09 '20 at 19:46
  • @MrFlick Thank you, I will make sure to do this in future! – hps Jan 09 '20 at 20:22

1 Answers1

1

Here is an option with tidyverse

library(dplyr)
library(purrr)
library(magrittr)
df %>%
   mutate(hope = select(., starts_with('hope')) %>% 
                is.na %>%
                 `!` %>% 
                 rowSums %>% 
                 is_greater_than(0) %>% 
                 as.integer)
#   hopex hopey hopez hopea hopeb dreamx dreamy dreamz dreama dreamb joyx joyy joyz joya joyb hope
#1    NA    NA    NA    NA    NA     NA     NA     NA     NA     NA   NA   NA   NA   NA   NA    0
#2     1     1     4     3     2      3      5      4      5      2    5   NA    4    3    1    1
#3     2    NA     4     4     4      3      5     NA      5      5    4   NA    4    5    1    1
#4     4     3    NA     1     1      1      5      2     NA      5    1    2    1    1    1    1
#5     1    NA     4    NA    NA      2      1      5      1      2   NA    3    1    2    5    1

Or with rowSums

df %>%
     mutate(hope = +(rowSums(!is.na(select(., starts_with('hope'))))!= 0))

For multiple columns, we can create a function

f1 <- function(dat, colSubstr) {
         dplyr::select(dat, starts_with(colSubstr)) %>%
                is.na %>%
                 `!` %>%                     
                 rowSums %>%
                 is_greater_than(0) %>% 
                 as.integer 
   }

df %>%
      mutate(hope = f1(., 'hope'),
             dream = f1(., 'dream'),
             joy = f1(., 'joy'))

Or using base R

cbind(df, sapply(split.default(df, sub(".$", "", names(df))), 
             function(x) +(rowSums(!is.na(x)) != 0)))

If we want to subset columns

nm1 <- setdiff(names(df), "ID")
cbind(df, sapply(split.default(df[nm1], sub(".$", "", names(df[nm1]))),
        function(x) +(rowSums(!is.na(x)) != 0)))

data

set.seed(24)
df <- as.data.frame(matrix(sample(c(NA, 1:5), 5 * 15, replace = TRUE),
    ncol = 15, dimnames = list(NULL, paste0(rep(c("hope", "dream", "joy"), 
   each = 5), c('x', 'y', 'z', 'a', 'b')))))
df[1,] <- NA
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Yes, thanks for creating the example data! I'll try to clarify. In your example data, if I set df[1,1:5] <- NA, I still get all 1's in 'hope', even though hope should be 0 in row 1. Also, I would like the new variables to be stored in a dataframe, which doesn't happen now. – hps Jan 09 '20 at 21:02
  • Thanks, this does it for the example data. I added my own example in the original message which has another variable (ID). Would you have a solution that creates new variables only for the conditions (hope, dream, joy) but not for ID? – hps Jan 09 '20 at 21:40
  • I checked it; rowSums does not work (again, all 1's) but subsetting the columns works. Thanks! – hps Jan 09 '20 at 22:15