2

I am working with the following data frame:

Name  Color   L1    L2    R3 
Joe    Red    5.4   6.2   7.7
Eric   Blue   NA    4.1   6.1
Steve  Green  NA    NA    1.2
Mike   Red    NA    NA    NA

I would like to count the number of observations across columns L1, L2, and L3, but not count NA values. So I would like the above table to become:

Name  Color   L1    L2    R3    Count 
Joe    Red    5.4   6.2   7.7    3
Eric   Blue   NA    4.1   6.1    2
Steve  Green  NA    NA    1.2    1
Mike   Red    NA    NA    NA     0

Thanks in advance!

Henrik
  • 65,555
  • 14
  • 143
  • 159
887
  • 599
  • 3
  • 15

3 Answers3

4

We can use rowSums on a logical matrix i.e. is.na(df1[3:5]) returns a logical matrix where TRUE -> NA and FALSE -> nonNA, by negating (!), TRUE -> FALSE and viceversa. Then, with rowSums, the TRUE (TRUE/FALSE -> 1/0) are added together

df1$Count <- rowSums(!is.na(df1[3:5]))

Or using tidyverse

library(dplyr)
df1 %>%
    mutate(Count = rowSums(!is.na(select(., 3:5))))

-output

#    Name Color  L1  L2  R3 Count
#1   Joe   Red 5.4 6.2 7.7     3
#2  Eric  Blue  NA 4.1 6.1     2
#3 Steve Green  NA  NA 1.2     1
#4  Mike   Red  NA  NA  NA     0

Or if we need to use column names in a range, use the select

df1 %>%
    mutate(Count = rowSums(!is.na(select(., L1:R3))))
 

NOTE: rowSums is vectorized and would be very fast

data

df1 <- structure(list(Name = c("Joe", "Eric", "Steve", "Mike"), Color = c("Red", 
"Blue", "Green", "Red"), L1 = c(5.4, NA, NA, NA), L2 = c(6.2, 
4.1, NA, NA), R3 = c(7.7, 6.1, 1.2, NA)), class = "data.frame",
row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • would it be possible to use column names in the rowsums command rather than the location of the columns? I just figure it would be much easier in case the location of those columns changes – 887 Dec 10 '20 at 20:06
  • @rogues77 you can use a range of column names in `select`, updated the post – akrun Dec 10 '20 at 20:08
  • I might be making a mistake but i keep getting this error ```Error: Problem with `mutate()` input `Count`. x Input `Count` can't be recycled to size 826. ℹ Input `Count` is `rowSums(!is.na(select(.,L1:R3) Error: Problem with `mutate()` input `Count`. x Input `Count` can't be recycled to size 826. ℹ Input `Count` is `rowSums(!is.na(select(.,``` – 887 Dec 11 '20 at 00:38
  • @rogues77 Not sure about your case. I showed the example data as a reproducible one. – akrun Dec 11 '20 at 16:40
  • 1
    Thanks very much! I'm surprised that your second solution using `rowSums()` + dplyr is so much faster than the answer below by Duck using `rowwise()` + `c_across()`! – elarry Feb 21 '22 at 19:17
  • 1
    @elarry `rowSums/rowMeans` are some of the fastest vectorized functions in `base R` that does row wise operation. whereas `rowwise` is a slow loop – akrun Feb 21 '22 at 19:19
  • 1
    Good to know! I've been preferring tidyverse over base R for its readability, but `rowSums()` is such a clear win in this case! Many thanks, @akrun! – elarry Feb 21 '22 at 19:28
1

Try this using dplyr:

library(dplyr)
#Code
newdf <- df %>% rowwise() %>% mutate(Count=sum(!is.na(c_across(L1:R3))))

Output:

# A tibble: 4 x 6
# Rowwise: 
  Name  Color    L1    L2    R3 Count
  <chr> <chr> <dbl> <dbl> <dbl> <int>
1 Joe   Red     5.4   6.2   7.7     3
2 Eric  Blue   NA     4.1   6.1     2
3 Steve Green  NA    NA     1.2     1
4 Mike  Red    NA    NA    NA       0

Some data used:

#Data
df <- structure(list(Name = c("Joe", "Eric", "Steve", "Mike"), Color = c("Red", 
"Blue", "Green", "Red"), L1 = c(5.4, NA, NA, NA), L2 = c(6.2, 
4.1, NA, NA), R3 = c(7.7, 6.1, 1.2, NA)), class = "data.frame", row.names = c(NA, 
-4L))
Duck
  • 39,058
  • 13
  • 42
  • 84
1

Simple base R solution:

df1$Count <- apply(df1[3:5], 1, function(x) sum(!is.na(x)))
SteveM
  • 2,226
  • 3
  • 12
  • 16