0

I have a list of data as part of big data frame, it is sorted by time so I can not change the order of the list :

ID <-  c("p","fxman27","duncane" ,"duncane" ,"dday1026" ,"duncane","dday1026","dday1026" ,"dday1026" ,"dday1026","cesandjoel","pali777","ranger_2","marymom6" , "deaglekl")

Now I need a function to produce this counts of those IDs:

 s<-c(1 , 2  ,3 , 3 , 4,  4  ,4 , 4 , 4  ,4 , 5,  6,  7,  8 , 9)

as you can see, the function count number of distinct ID's and remain constant if the ID is the same in previous row. I don't want to use "for" loops, and prefer BASE functions.

the "s" is not simple frequency table, for that I know I can use aggregate. This is not a grouping question, but number of "current Id's" is auctions Thanks

2 Answers2

7

You can count non-duplicated IDs:

cumsum(!duplicated(ID))
# [1] 1 2 3 3 4 4 4 4 4 4 5 6 7 8 9
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

If your data set is large, you may be better off using dplyr, but this solves the example with base only functions:

apply(as.matrix(1:length(ID)), 1, function(n) length(unique(ID[1:n])))
  • 1
    This is a loop. Also, how dplyr is related here? Finally, why converting a vector to a matrix and not just use `sapply` over it? – David Arenburg Dec 25 '16 at 19:36
  • I was thinking of using `dplyr::n_distinct`. You're right, `sapply(1:length(ID), function(n) length(unique(ID[1:n])))` would probably be best implementation of this. However, the approach by @psidom is ultimately better. – user3349904 Dec 25 '16 at 19:51