3

Consider the MWE below, where we have Amt indicating different amounts (from 1 to 40 with NAs) for each Food item and another variable indicating the Site of that food item. I wanted a summary median and a count n() of food items but for those without NA.

MWE

 mwe <- data.frame(
  Site = sample(rep(c("Home", "Office"), size = 884)),
  Food = sample(rep(c("Banana","Apple","Egg","Berry","Tomato","Potato","Bean","Pea","Nuts","Onion","Carrot","Cabbage","Eggplant"), size=884)),
  Amt = sample(seq(1, 40, by = 0.25), size = 884, replace = TRUE)
)
random <- sample(seq(1, 884, by = 1), size = 100, replace = TRUE) # to randomly introduce 100 NAs to Amt vector
mwe$Amt[random] <- NA

Data frame

    Site     Food   Amt
1 Office  Cabbage 16.50
2   Home    Apple 36.00
3 Office      Egg  7.25
4   Home    Onion 16.00
5 Office Eggplant 36.50
6   Home     Nuts    NA

Summary Code

dfsummary <- mwe %>%
  dplyr::group_by(Food, Site) %>%
  dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2), N = n()) %>%
  ungroup()

Output

# A tibble: 6 x 4
  Food   Site   Median     N
  <fct>  <fct>   <dbl> <int>
1 Apple  Home     17      34
2 Apple  Office   22.2    34
3 Banana Home     19.5    34
4 Banana Office   19.9    34
5 Bean   Home     20      34
6 Bean   Office   18      34

Some food items showed NA values, however they made their way in the N count. I simply do not want to count those with NAs in the Amt vector.

doctorate
  • 1,381
  • 1
  • 19
  • 43

1 Answers1

4

We can filter at the top and then do the summarise without changing the code

library(dplyr)
mwe %>% 
   filter(!is.na(Amt)) %>% 
   dplyr::group_by(Food, Site) %>%
    dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2),
       N = n()) %>%
    ungroup()

Or another option is to change the n() to sum(!is.na(Amt))

mwe %>%
    dplyr::group_by(Food, Site) %>%
    dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2), 
         N = sum(!is.na(Amt))) %>%
    ungroup()
akrun
  • 874,273
  • 37
  • 540
  • 662