Use of aggregate function in R

Question

I have data like this:

ID <- c(1001,1001, 1001, 1002,1002,1002)
activity <- c(123,123,123, 456,456,789)
df<- data.frame(ID,activity)

I want to count the number of unique activity values within ID to end up with a dataframe like this:

N<- c(1,1,1,2,2,2)
data.frame(df,N)

So we can see that person 1001 did only 1 activity while person 1002 did two.

I think it can be done with aggregate but am happy to use another approach.

Do the final N values have to repeat , or can there be a separate dataframe that summarizes the data? — Harry Smith, Sep 17 '22 at 00:54
Ideally they repeat. So I can select the persons who only did one activity in the same dataframe. — Log On, Sep 17 '22 at 01:05

Harry Smith · Answer 1 · 2022-09-17T02:03:52.103

1

dplyr option

sum_df <- df %>%
  group_by(ID) %>%
  summarize(count_distinct = n_distinct(activity)) %>%
  left_join(df,
          by = 'ID')

edited Sep 17 '22 at 02:03

answered Sep 17 '22 at 01:02

Harry Smith

1 Answers1