Using data.table package we can preserve the original sorting (recommended):
library(data.table)
setDT(df1)[,new_col:=.GRP-1, by = c("A", "B","C")]
#if you want the column as factor (one-liner, no need for previous line)
setDT(df1)[,new_col:=.GRP-1, by = c("A", "B","C")][,new_col:=as.factor(new_col)]
Using dplyr we can do something like below:
(Rui's solution implemented in dplyr
with minimal modification to consider possibility of duplicate rows):
This also preserves the sorting;
df1 %>% mutate(mtemp=paste0(A,B,C)) %>%
mutate(new_col = as.integer(factor(mtemp, levels = unique(.$mtemp)))-1) %>%
select(-mtemp)
We can use a dummy variable to label the data:
df1 %>% mutate(mtemp = paste0(A,B,C)) %>%
group_by(mtemp) %>% arrange(mtemp) %>% ungroup() %>%
mutate(new_col = c(0,cumsum(lead(mtemp)[-n()] != lag(mtemp)[-1]))) %>% select(-mtemp)
# # A tibble: 8 x 5
# A B C newCol new_col
# <dbl> <dbl> <dbl> <int> <dbl>
# 1 0 0 0 0 0
# 2 0 0 0 0 0
# 3 0 0 1 3 1
# 4 0 1 0 2 2
# 5 0 1 1 5 3
# 6 1 0 0 1 4
# 7 1 1 0 4 5
# 8 1 1 1 6 6
or in reference to this thread:
df1 %>%
mutate(group_id = group_indices(., paste0(A,B,C)))
Explanation on dplyr
solutions:
First solution creates a dummy variable by pasting all three desired variables together; in the next step, each group of that dummy var gets a unique id (compare newCol
to new_col
). Basically if mtemp
changes between any two rows, we get True
(its numeric value is 1
) as the answer of our comparison (lead(mtemp)...
) and then cumsum
adds it to the previous id generated which eventually results in different ids for different mtemp
(combination of A
, B
, and C
). This solution relies on arrangement of the dummy variable and therefore does not address the sorting requirements.
For the other solution, just read up on ?group_indices
.
Data:
df1 <- structure(list(A = c(0, 1, 0, 0, 0, 1, 0, 1), B = c(0, 0, 0,
1, 0, 1, 1, 1), C = c(0, 0, 0, 0, 1, 0, 1, 1), newCol = c(0L,
1L, 0L, 2L, 3L, 4L, 5L, 6L)), class = "data.frame", row.names = c(NA,
-8L))