row number increases by specific conditions in more fast way R

Question

The data frame I have looked like this.

"rank" variable has to be increased once the differences between the [i]th row of "start" and the [i-1]th row of "end" are over 14.(also, when encountered the different "ID")

I tried the code below and it worked very well.

But the thing is.. it is way too slow because I have like over 700000 rows.

So, is there any way to make it perform much faster?


df$rank <- 1

for(i in 2:nrow(l50.df)){
  df[i,"rank"] <- ifelse((df[i,"ID"]==df[i-1,"ID"])&
                                   (df[i-1,"diff"]<=14), 
                         df[i,"rank"] <- df[i-1,"rank"],
                         df[i,"rank"] <- df[i-1,"rank"] + 1)
}

Pleaase do not include images of your data, but privide a [minimal reproducible sample](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Wimpel, Feb 18 '20 at 07:58
did you try dplyr? df <- df %>% mutate( rank = ifelse( (ID == lag(ID) ) & (lag(diff) <= 14), rank, rank+1) — Annet, Feb 18 '20 at 08:05

score 2 · Accepted Answer · answered Feb 18 '20 at 08:19

2

You can try :

library(dplyr)
df %>% mutate(rank  = cumsum(diff > 14 | ID != lag(ID, default = TRUE)))

Same logic using base R :

df$rank <- with(df, cumsum(diff > 14 | c(TRUE, tail(ID, -1) != head(ID, -1))))

answered Feb 18 '20 at 08:19

Ronak Shah

377,200
20
156
213

I solved the problem with a little application of your solution Thanks: ```df <- df %>% mutate(rank = cumsum((start - lag(end, default = TRUE)) > 14 | ID != lag(ID, default = TRUE)))``` – yoon Feb 18 '20 at 23:09

GKi · Answer 2 · 2020-02-19T09:41:04.983

You can use cumsum to get an increasing rank when the conditions df[i,"ID"]==df[i-1,"ID"]) & (df[i-1,"diff"]<=14) are meet.

df$rank <- cumsum(c(1,(df$ID != c(df$ID[-1], NA) | df$diff>14)[-nrow(df)]))
df
#   ID diff rank
#1   a    4    1
#2   a    6    1
#3   a    8    1
#4   a  870    1
#5   a   34    2
#6   a   NA    3
#7   b    4    4
#8   b    6    4
#9   b    8    4
#10  b  870    4
#11  b   34    5
#12  b   NA    6

Using your code:

df$rank <- 1
for(i in 2:nrow(df)){
  df[i,"rank"] <- ifelse((df[i,"ID"]==df[i-1,"ID"]) & (df[i-1,"diff"]<=14), 
    df[i,"rank"] <- df[i-1,"rank"], df[i,"rank"] <- df[i-1,"rank"] + 1)
}
df
#   ID diff rank
#1   a    4    1
#2   a    6    1
#3   a    8    1
#4   a  870    1
#5   a   34    2
#6   a   NA    3
#7   b    4    4
#8   b    6    4
#9   b    8    4
#10  b  870    4
#11  b   34    5
#12  b   NA    6

Data:

df  <- data.frame(ID=rep(c("a","b"), each=6), diff=c(4,6,8,870,34,NA)
                  , stringsAsFactors = FALSE)
df
#   ID diff
#1   a    4
#2   a    6
#3   a    8
#4   a  870
#5   a   34
#6   a   NA
#7   b    4
#8   b    6
#9   b    8
#10  b  870
#11  b   34
#12  b   NA

score 0 · Answer 3 · answered Feb 18 '20 at 08:36

0

Here is a base R solution using ave + ifelse

df <- within(df,rank <- ave(diff>14, diff>14,ID,FUN = function(x) ifelse(x,seq(x),+!x)))

answered Feb 18 '20 at 08:36

ThomasIsCoding

96,636
9
24
81

row number increases by specific conditions in more fast way R

3 Answers3