Create multiple sequences dependent on data frame column

Question

Starting with data with the start of the desired sequences filled in with 1, I need to fill in the NA rows with sequences. Below is the starting data (first two columns) and the desired third column:

I can make this happen with a loop, below, but what is the better R programming way to do it?

for(i in 1:length(df2$col2)) {
  df2$col3[i] <- ifelse(df2$col2[i] == 1, 1, df2$col3[i - 1] + 1)
  if(is.na(df2$col2[i])) df2$col3[i] <- df2$col3[i - 1] + 1
}

Here is a 20-row data set of the first two columns:

structure(list(col1 = c(478.69, 320.45, 503.7, 609.3, 478.19, 
478.69, 320.45, 503.7, 609.3, 478.19, 419.633683050051, 552.939975773916, 
785.119385505095, 18.2542654918507, 98.6469651805237, 132.587260054424, 
697.119552921504, 512.560374778695, 916.425200179219, 14.3385051051155
), col2 = c(1, NA, 1, NA, NA, 1, NA, 1, NA, NA, NA, NA, 1, NA, 
NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-20L))

Isn't this the same question from you? [Add column to data frame with sequence depending on other column](https://stackoverflow.com/q/64858688/10488504) — GKi, Nov 16 '20 at 14:12
I had multiple messages that the post did not work, so I thought it disappeared and I repeated it. Seems this was during a SO maintenance outage. So, yeah, sorry. But your answer takes a different approach so it would be nice maybe to keep both posts? — markhogue, Nov 16 '20 at 14:21

score 1 · Answer 1 · answered Nov 16 '20 at 13:52

Try:

library(data.table)                                                                                                                                         
df2 <- data.table(df2)
df2[, col3 := col2[1] + 1 * (1:.N - 1), by = .(cumsum(!is.na(col2)))]

score 1 · Answer 2 · answered Nov 16 '20 at 13:59

You can use ave with seq_along with grouping using cumsum.

df2$col3 <- ave(integer(nrow(df2)), cumsum(!is.na(df2$col2)), FUN=seq_along)
df2
#        col1 col2 col3
#1  478.69000    1    1
#2  320.45000   NA    2
#3  503.70000    1    1
#4  609.30000   NA    2
#5  478.19000   NA    3
#6  478.69000    1    1
#7  320.45000   NA    2
#8  503.70000    1    1
#9  609.30000   NA    2
#10 478.19000   NA    3
#11 419.63368   NA    4
#12 552.93998   NA    5
#13 785.11939    1    1
#14  18.25427   NA    2
#15  98.64697   NA    3
#16 132.58726   NA    4
#17 697.11955   NA    5
#18 512.56037   NA    6
#19 916.42520   NA    7
#20  14.33851   NA    8

Create multiple sequences dependent on data frame column

2 Answers2