1

I have a data frame and I would like to count the number of different observations per group, not counting the NA values.

Here is an example of the data:

ID <-c("A", "A", "B", "B", "B", "C")
Act1 <- c("Football", "Swim", "Football", 'Basketball', "Swim", "Tennis")
Act2 <- c("Swim", "Football", "Tennis", 'Swim', "Football", "Swim")
Act3 <- c("NA", "Tennis", "NA", 'Football', "Tennis", "NA")
df <- data.frame(ID,Act1, Act2, Act3)

df

   ID       Act1     Act2     Act3
1  A   Football     Swim       NA
2  A       Swim Football   Tennis
3  B   Football   Tennis       NA
4  B Basketball     Swim Football
5  B       Swim Football   Tennis
6  C     Tennis     Swim       NA 

The correct answer would look like this...

  ID  n
1  A  3
2  B  4
3  C  2

Because A has three different activities (e.g. football, swim, tennis), B has four (e.g. football, swim, tennis, basketball) and C has two (e.g. tennis and swim)

How could I do that?

Ric S
  • 9,073
  • 3
  • 25
  • 51
amberluc
  • 65
  • 4

1 Answers1

1

Presuming that the null values are actually NA values instead of strings "NA", you can use packages dplyr and tidyr to achieve your expected output

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-ID) %>% 
  filter(!is.na(value)) %>%   # if you have strings "NA" use   filter(value != "NA")   
  group_by(ID) %>%
  summarise(n = n_distinct(value))

# A tibble: 3 x 2
#   ID        n
#   <chr> <int>
# 1 A         3
# 2 B         4
# 3 C         2
Ric S
  • 9,073
  • 3
  • 25
  • 51
  • Thank you. I have tried this solution and I just get a one row one column table showing n = 5. any suggestions? – amberluc Jul 17 '20 at 10:09
  • I obtain that result if I comment the `group_by` and the `filter` statement, otherwise my code works perfectly on your sample data – Ric S Jul 17 '20 at 10:14
  • 1
    @amberluc Check [Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?](https://stackoverflow.com/questions/26106146/why-does-summarize-or-mutate-not-work-with-group-by-when-i-load-plyr-after-dp) – Henrik Jul 17 '20 at 10:18
  • That solved it, thank you! – amberluc Jul 17 '20 at 10:22