0

I asked participants questions concerning their health status and they could either choose "yes" or "no". Now, I want to create a subset of participants that reported having no symptoms at all, i.e., only said "no" to EVERY symptom.

So, all in all, I am searching for "no"-entries to create a subset (in R) and examine the number of people that have no symptoms at all.

The thing is that I only assessed headache in Week 1, sickness and fatigue in Week 2, and Coughing and Diarrhea in Week 3. Therefore, I get NA for the missing values.

So far, so good. When I am searching for "yes" to create a subset with participants that reported to have at least ONE symptom (or more), my results are fine. But when I try to do it vice versa, it does not really work, since I just want to have the "no" answers. As soon as, a participant has a symptom I want them to be excluded.

This is what my code looks like:

data$no_symptoms <- case_when(
        data$headache == "no" ~ "NS",
        data$sickness == "no" ~ "NS",
        data$coughing == "no" ~ "NS",
        data$fatigue == "no" ~ "NS",
        data$diarrhea == "no" ~ "NS",
        TRUE ~ as.character(data$headache, data$sickness, data$coughing, data$fatigue,
                  data$diarrhea)
)
no_symptoms <- subset(data,data$no_symptoms=="NS")

I expected a subset that would look like this if I open it: enter image description here

Instead, I get this: enter image description here

I am super grateful for every hint or advice!!! Let me know if I can add some more information. M< main problem is just that I want to exclude everybody that says "yes" across all symptoms. I don't care about NA, I just need the people that said "no" to every assessed symptom.

Thank you so much! :)

Gertie

nobody
  • 31
  • 4
  • 1
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064) – MrFlick Aug 12 '22 at 14:38
  • 3
    Note that `case_when` stops at the first time it finds a TRUE value, not a FALSE. So if it finds a "no", it will return NS and then stop looking at other columns. If it finds a "yes", it will keep looking looking for "no" in the rest of the columns. `case_when` does not look like a good choice in this case. – MrFlick Aug 12 '22 at 14:41
  • check out `ifelse` – gaut Aug 12 '22 at 15:17

2 Answers2

0

As it was mentioned before, it is hard to answer your question without a reproducible example, I generated an example using the results you obtained.

The piece of code does the trick and it removes patients with symptoms, I supposed you could have more columns to filter on, so I just removed Age and Gender columns.

library(dplyr)
library(tibble)

# Generating an reporductible example
# We expected entries 3, 4 and 5 to be filtered out because a 'Yes' is found in the record
data <- tibble(Age = c(22, 54, 24, 33, 27, 66),
           Gender = c('M', 'F', 'M', 'D', 'D', 'F'),
           Headache = c('No', 'No', NA, NA, NA, 'No'),
           Sickness = c(NA, NA, 'No', 'Yes', NA, NA),
           Fatigue = c(NA, NA, 'Yes', 'No', NA, NA),
           Coughning = c(NA, NA, NA, NA, 'Yes', NA),
           Diarrhea = c(NA, NA, NA, NA, 'No', NA))

data %>% filter(if_all(.cols = names(.)[-c(1,2)], .fns = ~ is.na(.x) | .x == 'No'))

If it doesn't work, you should consider posting a complete example that we can work on,

Hope it helps,

Bests

Lagouz
  • 1
0

As mentioned, using case_when will stop with evaluations if this first is TRUE. So, if there is "no" headache, it will not proceed further with evaluations for sickness, coughing, etc.

Another approach is to use rowSums if you wish to create a new column indicating the presence of symptoms. For example, you can use rowSums and select relevant columns (in this case all columns but "Age" and "Gender") comparing with "Yes" and removing missing NA. If the rowSums result is zero, then no_symptoms will be TRUE.

library(tidyverse)

data %>%
  mutate(no_symptoms = rowSums(select(., -c("Age", "Gender")) == "Yes", na.rm = T) == 0)

If you don't wish to create a new column, you can alternatively filter and keep rows where this is TRUE:

data %>%
  filter(rowSums(select(., -c("Age", "Gender")) == "Yes", na.rm = T) == 0)
Ben
  • 28,684
  • 5
  • 23
  • 45