1

This is the code that I am trying to run and it's taking a while.

Districts is a data frame of 39299 rows and 16 columns and lm_data is a data frame of 59804 rows and 16 variables. I want to set up a new variable in lm_data called tentativeStartDate which takes on the value of districts$firstDay[j] if a couple of conditions are meant. Is there a more efficient way to do this?

for (i in 1: nrow(lm_data)){
  for (j in 1: nrow(districts)){
    if (lm_data$DISTORGID[i] == districts$DISTORGID[j] & lm_data$gradeCode[i] == districts$gradeCode[j]){
      lm_data$tentativeStartDate[i] = districts$firstDay[j]
    }
  }
}
user438383
  • 5,716
  • 8
  • 28
  • 43
  • Please provide a [mcve]. See [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269/4996248) for what this would mean in R. Also, as a new user, it wouldn't hurt for you to take the [tour](https://stackoverflow.com/tour) and read [ask]. – John Coleman Jun 25 '21 at 15:12

1 Answers1

1

Not sure if this will work since I can't test it, but if it does work it should be much faster.

# get the indices
idx <- which(lm_data$DISTORGID == districts$DISTORGID & lm_data$gradeCode == districts$gradeCode)

lm_data$tentativeStartDate[idx] <- districts$firstDay[idx]
Snipeskies
  • 258
  • 1
  • 5