How do I use the tidyverse packages to get a running total of unique values occurring in a column?

Question

I'm trying to use the tidyverse (whatever package is appropriate) to add a column (via mutate()) that is a running total of the unique values that have occurred in the column so far. Here is some toy data, showing the desired output.

data.frame("n"=c(1,1,1,6,7,8,8),"Unique cumsum"=c(1,1,1,2,3,4,4))

Who knows how to accomplish this in the tidyverse?

score 2 · Accepted Answer · answered May 03 '19 at 16:53

2

Here is an option with group_indices

library(dplyr)
df1%>% 
     mutate(unique_cumsum = group_indices(., n))
#   n unique_cumsum
#1 1             1
#2 1             1
#3 1             1
#4 6             2
#5 7             3
#6 8             4
#7 8             4

data

df1 <- data.frame("n"=c(1,1,1,6,7,8,8))

answered May 03 '19 at 16:53

akrun

874,273
37
540
662

1

This is another good option. Thank you. – curiositasisasinbutstillcuriou May 03 '19 at 17:01

James · Answer 2 · 2019-05-03T16:59:37.973

1

Here's one way, using the fact that a factor will assign a sequential value to each unique item, and then converting the underlying factor codes with as.numeric:

data.frame("n"=c(1,1,1,6,7,8,8)) %>% mutate(unique_cumsum=as.numeric(factor(n)))
  n unique_cumsum
1 1             1
2 1             1
3 1             1
4 6             2
5 7             3
6 8             4
7 8             4

edited May 03 '19 at 16:59

answered May 03 '19 at 16:51

James

65,548
14
155
193

2

This is faster `match(n, unique(n))` – markus May 03 '19 at 16:52

score 0 · Answer 3 · answered May 03 '19 at 17:10

0

Another solution:

df <- data.frame("n"=c(1,1,1,6,7,8,8))
df <- df %>% mutate(`unique cumsum` = cumsum(!duplicated(n)))

This should work even if your data is not sorted.

answered May 03 '19 at 17:10

thc

9,527
1
24
39

How do I use the tidyverse packages to get a running total of unique values occurring in a column?

3 Answers3

data