Count number of rows satisfying two conditions at the same time

Question

I am working with a survey where participants answer the first question with yes or no and then a second open-ended question "if yes, why?"

I need to find out the percentage of people who answer the second question after saying "yes". Or alternatively, I need to find the number of 'NA's after they answer "yes".

Here is a similar-looking dataset:

#>      helpful     helpfulhow               
#> 1    n           NA
#> 2    y           Because this study cannot be put online. Thus I have to create a random wall of text    
#> 3    n           NA         
#> 4    y           This is a confidential study. Thus the data must be changed.
#> 5    n           NA   
#> 6    n           NA
#> 7    y           This is a confidential study. Thus the data must be changed every time. 
#> 8    y           NA
#> 9    y           Qualitative studies are difficult to assess. Here is a random wall of text.

> str(b)
'data.frame':   9 obs. of  2 variables:
 $ helpful   : Factor w/ 2 levels "n","y": 1 2 1 2 1 1 2 2 2
 $ helpfulhow: Factor w/ 4 levels "Because this study cannot be put online. Thus I have to create a random wall of text.",..: NA 1 NA 4 NA NA 3 NA 2

> dput(head(b))
structure(list(helpful = structure(c(1L, 2L, 1L, 2L, 1L, 1L), .Label = c("n", 
"y"), class = "factor"), helpfulhow = structure(c(NA, 1L, NA, 
4L, NA, NA), .Label = c("Because this study cannot be put online. Thus I have to create a random wall of text.", 
"Qualitative studies are difficult to assess. Here is a random wall of text.", 
"This is a confidential study. Thus the data must be changed every time.", 
"This is a confidential study. Thus the data must be changed."
), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

So for example, I want to find out how many people who put 'y's under helpful also put 'NA' under helpfulhow. Thanks in advance.

welcome to stackoverflow. You need to give information related to how the data are structured. Please see https://stackoverflow.com/help/minimal-reproducible-example — greengrass62, Jul 06 '20 at 19:47
It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Jul 06 '20 at 19:47

M-- · Accepted Answer · 2020-07-06T21:23:28.163

I have made an example dataset like below; here, I am counting number of rows with Question-1 answered as "Yes" and Question-2 either as empty (using trimws to get rid of spaces) or as NA. Then, divided by the total number of rows, we get the fraction. Using percent from package scales I convert it to a percentage.

#>      Name  Q1               Q2
#> 1   Jerry Yes             <NA>
#> 2    Beth  No                 
#> 3 Jessica Yes                 
#> 4   Morty Yes       Aww,Babola
#> 5  Summer  No                 
#> 6    Rick Yes Wubbalubbadubdub


## percentage of people who answered yes to Q1 and also answered Q2
scales::percent(nrow(with(df, 
                          df[Q1=="Yes" & 
                            (trimws(Q2) != "" & !is.na(Q2)),]))/nrow(with(df, 
                                                                          df[Q1=="Yes",])))

#> [1] "50.0%"

Data:

df <- structure(list(Name = structure(c(2L, 1L, 3L, 4L, 6L, 5L), 
                                      .Label = c("Beth", "Jerry", "Jessica", "Morty", "Rick", "Summer"), class = "factor"), 
                     Q1 = structure(c(2L, 1L, 2L, 2L, 1L, 2L), 
                                    .Label = c("No", "Yes"), class = "factor"), 
                     Q2 = structure(c(NA, 1L, 2L, 3L, 1L, 4L), 
                                    .Label = c("", "       ", "Aww,Babola", "Wubbalubbadubdub"), class = "factor")), 
                class = "data.frame", row.names = c(NA, -6L))

For your dataset, it would be like this:

scales::percent(nrow(with(b, b[helpful=="y" & (trimws(helpfulhow) != "" & !is.na(helpfulhow)),]))/nrow(with(b, b[helpful=="y",])))

#> [1] "100%"

To make it cleaner, we can use dplyr package:

library(dplyr)
library(scales)

percent(
  b %>% 
    filter(helpful == "y", !is.na(helpfulhow), trimws(helpfulhow) != "") %>% 
    nrow(.) / {b %>% filter(helpful == "y") %>% nrow(.)})

#> [1] "100%"

or

b %>% 
  group_by(helpful) %>% 
  summarise(percent_helpfulhow = percent(sum(trimws(helpfulhow) != "" & !is.na(helpfulhow)) / n())) %>% 
  filter(helpful == "y") %>% 
  pull(2)

#> [1] "100%"

Matt · Answer 2 · 2020-07-07T00:55:15.780

2

Here is a possible solution using the packages dplyr and janitor:

library(dplyr)
library(janitor)

df %>% 
  mutate(na_flag = ifelse(helpful == 'y' & is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting

Which gives us:

 na_flag n percent
       N 6  100.0%

If every response to helpfulhow in this sample dataset (n = 6) was NA, this would show:

 na_flag n percent
       N 4   66.7%
       Y 2   33.3%

Since two respondents answered y for helpful but did not leave a response for helpfulhow.

If you just want to look at y respondents, you can do:

df %>% 
  filter(helpful == "y") %>%
  mutate(na_flag = ifelse(is.na(helpfulhow), "Y", "N")) %>% 
  tabyl(na_flag) %>% 
  adorn_pct_formatting

edited Jul 07 '20 at 00:55

answered Jul 06 '20 at 21:08

Matt

7,255
2
12
34

This is amazing! It works like a charm in the case of finding the percentage of NAs in 'n' and 'y' together. Is there any way for me to separate them though? Like if I want to find the percentage of NAs in just the 'y's? Thanks so much! – inkyfingers Jul 06 '20 at 21:39
@inkyfingers use `filter(helpful == "y")` like what I did in my answer. – M-- Jul 06 '20 at 21:40
@M-- Thank you, do you know where to put the ```filter(helpful == "y")``` in this case? – inkyfingers Jul 06 '20 at 21:49
@inkyfingers ```df %>% filter(helpful == "y") %>% ...``` – M-- Jul 06 '20 at 21:52
@inkyfingers I edited the post to show how to select `helpful` responses equal to `y`. – Matt Jul 07 '20 at 00:55
1

Thanks a lot Matt! – inkyfingers Jul 07 '20 at 14:22

Count number of rows satisfying two conditions at the same time

2 Answers2

Data: