0

The data frame I have looked like this.

enter image description here

"rank" variable has to be increased once the differences between the [i]th row of "start" and the [i-1]th row of "end" are over 14.(also, when encountered the different "ID")

I tried the code below and it worked very well.

But the thing is.. it is way too slow because I have like over 700000 rows.

So, is there any way to make it perform much faster?


df$rank <- 1

for(i in 2:nrow(l50.df)){
  df[i,"rank"] <- ifelse((df[i,"ID"]==df[i-1,"ID"])&
                                   (df[i-1,"diff"]<=14), 
                         df[i,"rank"] <- df[i-1,"rank"],
                         df[i,"rank"] <- df[i-1,"rank"] + 1)
}
yoon
  • 460
  • 1
  • 5
  • 8
  • Pleaase do not include images of your data, but privide a [minimal reproducible sample](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Wimpel Feb 18 '20 at 07:58
  • did you try dplyr? df <- df %>% mutate( rank = ifelse( (ID == lag(ID) ) & (lag(diff) <= 14), rank, rank+1) – Annet Feb 18 '20 at 08:05

3 Answers3

2

You can try :

library(dplyr)
df %>% mutate(rank  = cumsum(diff > 14 | ID != lag(ID, default = TRUE)))

Same logic using base R :

df$rank <- with(df, cumsum(diff > 14 | c(TRUE, tail(ID, -1) != head(ID, -1))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I solved the problem with a little application of your solution Thanks: ```df <- df %>% mutate(rank = cumsum((start - lag(end, default = TRUE)) > 14 | ID != lag(ID, default = TRUE)))``` – yoon Feb 18 '20 at 23:09
1

You can use cumsum to get an increasing rank when the conditions df[i,"ID"]==df[i-1,"ID"]) & (df[i-1,"diff"]<=14) are meet.

df$rank <- cumsum(c(1,(df$ID != c(df$ID[-1], NA) | df$diff>14)[-nrow(df)]))
df
#   ID diff rank
#1   a    4    1
#2   a    6    1
#3   a    8    1
#4   a  870    1
#5   a   34    2
#6   a   NA    3
#7   b    4    4
#8   b    6    4
#9   b    8    4
#10  b  870    4
#11  b   34    5
#12  b   NA    6

Using your code:

df$rank <- 1
for(i in 2:nrow(df)){
  df[i,"rank"] <- ifelse((df[i,"ID"]==df[i-1,"ID"]) & (df[i-1,"diff"]<=14), 
    df[i,"rank"] <- df[i-1,"rank"], df[i,"rank"] <- df[i-1,"rank"] + 1)
}
df
#   ID diff rank
#1   a    4    1
#2   a    6    1
#3   a    8    1
#4   a  870    1
#5   a   34    2
#6   a   NA    3
#7   b    4    4
#8   b    6    4
#9   b    8    4
#10  b  870    4
#11  b   34    5
#12  b   NA    6

Data:

df  <- data.frame(ID=rep(c("a","b"), each=6), diff=c(4,6,8,870,34,NA)
                  , stringsAsFactors = FALSE)
df
#   ID diff
#1   a    4
#2   a    6
#3   a    8
#4   a  870
#5   a   34
#6   a   NA
#7   b    4
#8   b    6
#9   b    8
#10  b  870
#11  b   34
#12  b   NA
GKi
  • 37,245
  • 2
  • 26
  • 48
0

Here is a base R solution using ave + ifelse

df <- within(df,rank <- ave(diff>14, diff>14,ID,FUN = function(x) ifelse(x,seq(x),+!x)))
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81