I think this pretty straightforward problam has a very simple solution, but I can't figure it out.
Lets say I've got a data.table with some duplicated rows (rows 3 and 4 are identical)
dt <- data.table( val1 = c(1,2,3,3,4,5,6), val2 = 8 )
# val1 val2
# 1: 1 8
# 2: 2 8
# 3: 3 8
# 4: 3 8
# 5: 4 8
# 6: 5 8
# 7: 6 8
I want to throw away the duplucated rows, keeping only unique rows, and introduce a new column val3
that indicates how often a row occurs in the original data
expected output:
dt.output <- data.table( val1 = c(1,2,3,4,5,6), val2 = 8, val3 = c(1,1,2,1,1,1) )
# val1 val2 val3
# 1: 1 8 1
# 2: 2 8 1
# 3: 3 8 2
# 4: 4 8 1
# 5: 5 8 1
# 6: 6 8 1
I've got the feeling I'm almost there using an update-join with unique(dt)[, val3 := ....]
, but I can't get the ...
part to return what I want and it's driving me crazy.