0

I have a data frame in R whereby a value, attributed to a group, is repeated multiple times per group.

Sample data frame below:

Name    Group
John    A
John    A
John    A
John    B
John    B
John    C
Sally   A
Sally   A
Sally   B
Mark    A
Mark    B
Mark    B
Mark    C

I'm trying to count the number of times each Name occurs, as well as an output showing within how many groups they occur. So, the desired output would be below:

       Name Observations No. of Groups
1  John            6      3
2 Sally            3      2
3  Mark            4      3

In the case of John, we can see that there are a total of 6 observations across three groups, whereas there are 3 observations of Sally across two groups.

Is there a single function or formula that can perform this to yield the desired output? Thanks in advance.

NickB
  • 103
  • 5

1 Answers1

0

You can use -

library(dplyr)

df %>%
  group_by(Name) %>%
  summarise(Observations = n(), 
            num_groups = n_distinct(Group))
  

#   Name  Observations num_groups
#  <chr>        <int>      <int>
#1 John             6          3
#2 Mark             4          3
#3 Sally            3          2

If you want to write this in base R or data.table -

#Base R
aggregate(Group~Name, df, function(x) 
          c(len = length(x), num_groups = length(unique(x))))  

#data.table
library(data.table)
setDT(df)[, .(Observations = .N, num_groups = uniqueN(Group)), Name]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much, the first solution worked perfectly. In my limited experience, my first instinct was to use ```aggregate``` but I struggled with the function. The ```dplyr``` solution is much cleaner! – NickB Jul 26 '21 at 10:35