1

I'm trying to use the tidyverse (whatever package is appropriate) to add a column (via mutate()) that is a running total of the unique values that have occurred in the column so far. Here is some toy data, showing the desired output.

data.frame("n"=c(1,1,1,6,7,8,8),"Unique cumsum"=c(1,1,1,2,3,4,4))

Who knows how to accomplish this in the tidyverse?

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138

3 Answers3

2

Here is an option with group_indices

library(dplyr)
df1%>% 
     mutate(unique_cumsum = group_indices(., n))
#   n unique_cumsum
#1 1             1
#2 1             1
#3 1             1
#4 6             2
#5 7             3
#6 8             4
#7 8             4

data

df1 <- data.frame("n"=c(1,1,1,6,7,8,8)) 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here's one way, using the fact that a factor will assign a sequential value to each unique item, and then converting the underlying factor codes with as.numeric:

data.frame("n"=c(1,1,1,6,7,8,8)) %>% mutate(unique_cumsum=as.numeric(factor(n)))
  n unique_cumsum
1 1             1
2 1             1
3 1             1
4 6             2
5 7             3
6 8             4
7 8             4
James
  • 65,548
  • 14
  • 155
  • 193
0

Another solution:

df <- data.frame("n"=c(1,1,1,6,7,8,8))
df <- df %>% mutate(`unique cumsum` = cumsum(!duplicated(n)))

This should work even if your data is not sorted.

thc
  • 9,527
  • 1
  • 24
  • 39