0

I want to cumsum unique userid according to time, here's the data

     userid       time
   1          2014-10-01
   1          2014-11-16
   2          2014-11-14
   1          2014-10-16
   2          2014-11-16
   3          2014-10-16

the number of unique id on 2014-10-16 is 2, the number of unique id up to 2014-11-14 is 3.

I want to return a matrix like this

     userid       time        count
   1              2014-10-01         1
   1              2014-11-16         3
   2              2014-11-14         3
   1              2014-10-16         2
   2              2014-11-16         3
   3              2014-10-16         2

I guess I should use cumsum to do it. Actually, I found this thread is doing similar stuff Cumulative count of unique values in R

However, I follow the first answer of this thread but it doesn't work on my case

Community
  • 1
  • 1
Lucia
  • 615
  • 1
  • 9
  • 16
  • Your input and expected 'time' is not right. I guess `df1[, count := uniqueN(userid), time]` might be what you wanted – akrun Sep 14 '16 at 13:48
  • Assuming your data is ordered by `time`, you might be looking for `dt[, count := cumsum(!duplicated(userid))][, count := max(count), by = time]` – talat Sep 14 '16 at 13:56
  • have edited the 'time' input. – Lucia Sep 14 '16 at 14:14
  • @ akrun Yours is computing total unique users per day, not by that date – Lucia Sep 14 '16 at 14:27
  • @Lucia Okay, earlier there was some problem in the time column in expected output, seems like it is fixed now – akrun Sep 14 '16 at 14:29

1 Answers1

1

Since it seems that you're using data.table (from the look of your first version of the post), here's an approach using that package:

library(data.table)
setDT(dt) # just in case it's still a data.frame
dt[order(time), count := cumsum(!duplicated(userid))][,   
                count := max(count), by = time]
dt
#   userid       time count
#1:      1 2014-10-01     1
#2:      1 2014-11-16     3
#3:      2 2014-11-14     3
#4:      1 2014-10-16     2
#5:      2 2014-11-16     3
#6:      3 2014-10-16     2

After ordering the data by time, we compute the cumulative count of unique userid's and then get the maximum count per time.

talat
  • 68,970
  • 21
  • 126
  • 157