counting of distinct values in R

Question

I have a list of data as part of big data frame, it is sorted by time so I can not change the order of the list :

ID <-  c("p","fxman27","duncane" ,"duncane" ,"dday1026" ,"duncane","dday1026","dday1026" ,"dday1026" ,"dday1026","cesandjoel","pali777","ranger_2","marymom6" , "deaglekl")

Now I need a function to produce this counts of those IDs:

 s<-c(1 , 2  ,3 , 3 , 4,  4  ,4 , 4 , 4  ,4 , 5,  6,  7,  8 , 9)

as you can see, the function count number of distinct ID's and remain constant if the ID is the same in previous row. I don't want to use "for" loops, and prefer BASE functions.

the "s" is not simple frequency table, for that I know I can use aggregate. This is not a grouping question, but number of "current Id's" is auctions Thanks

use the `count` function from the `plyr` package. `plyr::count(ID)` — JakeC, Dec 25 '16 at 18:34
count just gives me frequency. I already have the frequency, I need the "s" vector as changes if "ID" changes and remains constant as "ID" doesn't change! — Omid Safarzadeh, Dec 25 '16 at 18:41
@DavidArenburg this is close but bot I want. I need s(6)=4, but your solution will give s(6)=3. — Omid Safarzadeh, Dec 25 '16 at 18:46
because, these are bidder ID's of an auction. I am trying to look at "current number of bidders" . — Omid Safarzadeh, Dec 25 '16 at 18:51
Ok, I reopened but I have no idea what the question is about. So good luck with that. — David Arenburg, Dec 25 '16 at 18:55
The question is actually duplicated. `cumsum(!duplicated(ID))` (the answer of @DavidArenburg from the link he provided) can reproduce your result, but `match(ID,unique(ID))` cannot because `match` function does not check duplicates of elements with smaller subscripts. — raymkchow, Dec 25 '16 at 19:06
If I hadn't seen the `duplicated` answer, I'd probably do `library(tidyverse); data_frame(ID) %>% mutate(s = map_int(accumulate(ID, c), n_distinct))`. Translated to base, something like `s <- sapply(Reduce(c, ID, accumulate = TRUE), function(x){length(unique(x))})`, though I'd probably still chuck everything in a data.frame to keep relationships clear. — alistaire, Dec 25 '16 at 19:38

score 7 · Answer 1 · answered Dec 25 '16 at 19:06

7

You can count non-duplicated IDs:

cumsum(!duplicated(ID))
# [1] 1 2 3 3 4 4 4 4 4 4 5 6 7 8 9

answered Dec 25 '16 at 19:06

Psidom

209,562
33
339
356

score 1 · Accepted Answer · answered Dec 25 '16 at 19:18

1

If your data set is large, you may be better off using dplyr, but this solves the example with base only functions:

apply(as.matrix(1:length(ID)), 1, function(n) length(unique(ID[1:n])))

answered Dec 25 '16 at 19:18

user3349904

34
4

1

This is a loop. Also, how dplyr is related here? Finally, why converting a vector to a matrix and not just use `sapply` over it? – David Arenburg Dec 25 '16 at 19:36
I was thinking of using `dplyr::n_distinct`. You're right, `sapply(1:length(ID), function(n) length(unique(ID[1:n])))` would probably be best implementation of this. However, the approach by @psidom is ultimately better. – user3349904 Dec 25 '16 at 19:51

counting of distinct values in R

2 Answers2