Frequency of a variable on a unique variable

Question

I have the below dataset with postIDs and replyIDs:

      postId      replyId
1   6074801669  759224201176
2   6074801669  465047320447
3   6074801669  690812551148
4   6074801669  465047290095
5   6560801670  465047500011
6   6560801670  869614571745
7   6560801670  869614571745
8   11446901671 100552911701
9   11446901671 759224201176
10  11446901671 100552911701
11  11446901671 759224201176
12  11446901671 465047690560
13  11446901671 759224201176

My issue is, that I want to have the frequency of replyId on a unique postId. More specifically, how many times do different replyIds appear on a specific postId. I am not sure if my description was specific enough, but this is what I want to see:

      postId      replyId       replyId.freq
1   6074801669  759224201176       4
2   6074801669  465047320447       4
3   6074801669  690812551148       4
4   6074801669  465047290095       4
5   6560801670  465047500011       2
6   6560801670  869614571745       2
7   6560801670  869614571745       2
8   11446901671 100552911701       3
9   11446901671 759224201176       3
10  11446901671 100552911701       3
11  11446901671 759224201176       3
12  11446901671 465047690560       3
13  11446901671 759224201176       3

e.g. for postId = 11446901671, 3 different replyIds are rendered even though this postId appears 6 times in the dataframe.

akrun · Accepted Answer · 2018-11-27T13:42:13.293

1

We can group by 'postId' and create the new column by getting the number of unique elements of 'replyId' with n_distinct

library(dplyr)
df %>%
    group_by(postId) %>%
    mutate(replyId.freq = n_distinct(replyId))

Or with base R

df$replyId.freq <- with(df, ave(replyId, postId, 
          FUN = function(x) length(unique(x)))

edited Nov 27 '18 at 13:42

answered Nov 27 '18 at 13:39

akrun

874,273
37
540
662

Frequency of a variable on a unique variable

1 Answers1