0

enter image description hereThe title is definitely not enough to understand my question. This is my data in short:

|ID | group | 
|---|-------|
| 1 | Banana| 
| 2 | Apple | 
| 3 | Apple | 
| 4 | Apple | 
| 5 | Banana| 
| 6 | Banana| 
| 7 | Apple | 
| 8 | Apple | 

Now I want to create a variable that numbers by group, however it should not start from 1 again after a new observation. So basically it looks like this:

|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1          |
| 2 | Apple | 1          |
| 3 | Apple | 2          |
| 4 | Apple | 3          | 
| 5 | Banana| 2          |
| 6 | Banana| 3          | 
| 7 | Apple | 4          |
| 8 | Apple | 5          |

When it should look like this:

|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1          |
| 2 | Apple | 1          |
| 3 | Apple | 2          |
| 4 | Apple | 3          | 
| 5 | Banana| 1          |
| 6 | Banana| 2          | 
| 7 | Apple | 1          |
| 8 | Apple | 2          |

I have to mention that I have lots of observations and not only the two groups Apple and Banana. Therefore code in which I have to name the groups like "Apple" and "Banana" is unfortunately not helpful. I tried to solve the problem like this:

df1<- df1%>%   
  group_by(group) %>%
  mutate(numbering = row_number())

But the error here is obvious. I also tried to work around the problem, but it is very difficult. If someone has a solution I would be very thankful!

2 Answers2

1

And another way:

df %>% 
  mutate(Temp=data.table::rleid(group)) %>% 
  group_by(Temp) %>% 
  mutate(row_number=row_number()) %>%
  select(-Temp)
Limey
  • 10,234
  • 2
  • 12
  • 32
0

Here are 3 ways to do this -

Base R -

df <- transform(df, row_number = ave(ID, with(rle(group), 
                 rep(seq_along(values), lengths)), FUN = seq_along))
df

#  ID  group row_number
#1  1 Banana          1
#2  2  Apple          1
#3  3  Apple          2
#4  4  Apple          3
#5  5 Banana          1
#6  6 Banana          2
#7  7  Apple          1
#8  8  Apple          2

dplyr -

library(dplyr)

df %>%
  group_by(grp = cumsum(group != lag(group, default = first(group)))) %>%
  mutate(row_number = row_number()) %>%
  ungroup %>%
  select(-grp)

data.table -

library(data.table)

setDT(df)[, row_number := seq_len(.N), rleid(group)]

data

df <- structure(list(ID = 1:8, group = c("Banana", "Apple", "Apple", 
"Apple", "Banana", "Banana", "Apple", "Apple")), row.names = c(NA, 
-8L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213