0

I want to count the number of columns that have an NA value after using group_by.

Similar questions have been asking, but counting total NAs not columns with NA (group by counting non NA)

Data:

  Spes <- "Year Spec.1 Spec.2 Spec.3 Spec.4
    1      2016   5      NA     NA     5
    2      2016   1      NA     NA     6
    3      2016   6      NA     NA     4
    4      2018   NA     5      5      9
    5      2018   NA     4      7      3
    6      2018   NA     5      2      1
    7      2019   6      NA     NA     NA
    8      2019   4      NA     NA     NA
    9      2019   3      NA     NA     NA"

    Data <- read.table(text=spes, header = TRUE)
    Data$Year <- as.factor(Data$Year)

The desired output:

2016 2
2018 1
2019 3

I have tried a few things, this is my current best attempt. I would be keen for a dplyr solution.

> Data %>% 
   group_by(Year) %>% 
   summarise_each(colSums(is.na(Data, [2:5])))

Error: Can't create call to non-callable object

I have tried variations without much luck. Many thanks

Jim
  • 558
  • 4
  • 13
  • `no applicable method for 'filter_' applied to an object of class "c('matrix', 'logical')` will experiment with this though! thanks – Jim Jul 18 '19 at 07:33

1 Answers1

1

One option could be to group_by Year, check if there is any NA values in each column and calculate their sum for each Year.

library(dplyr)
Data %>%
   group_by(Year) %>%
   summarise_all(~any(is.na(.))) %>%
   mutate(output = rowSums(.[-1])) %>%
   select(Year, output)


# A tibble: 3 x 2
#  Year  output
#  <fct>  <dbl>
#1 2016       2
#2 2018       1
#3 2019       3

Base R translation using aggregate

rowSums(aggregate(.~Year, Data, function(x) 
             any(is.na(x)), na.action = "na.pass")[-1], na.rm = TRUE)
#[1] 2 1 3
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Works! Many thanks, will answer as soon as allowed. – Jim Jul 18 '19 at 07:34
  • _I'm being cheeky_ but is there a line I could add to return column names? will edit question if there is a quickfix – Jim Jul 18 '19 at 08:03
  • @Jim which column names do you want to return ? – Ronak Shah Jul 18 '19 at 08:18
  • this with `NA` present, example output: `2016 2 Spec.2,Spec.3` (I realise this is not correct protocol on SO, I will edit the question accordingly) – Jim Jul 18 '19 at 08:24
  • @Jim So something like `Data %>% group_by(Year) %>% summarise_all(~any(is.na(.))) %>% gather(key, value, -Year) %>% filter(value) %>% select(-value)` would help ? – Ronak Shah Jul 18 '19 at 14:38
  • Brilliant, I'll figure it out. Cheers! – Jim Jul 18 '19 at 15:12