for loop over big dataframe with 2 conditions

Question

hey wonderful R community, I have a relatively big dataframe with over 10 million rows (df). It includes 7 different crops that are grown on 5 different soil types. I am trying to add the costs for fertilizer (psm) in an extra column depending on the soil and crop. I have psm in an extra table with all the other costs

I found one solution that works:

df$psm <- NA

for (i in 1:nrow(df)) {
  for (j in 1:nrow(costs)) {
    if (df$crop[i] == costs$crop[j] && !is.na(df$lbg_soil[i]) && df$lbg_soil[i] == costs$lbg[j] && costs$factor[j] == "PSM") {
      df$psm[i] <- costs$cost[j]  
    }
  }
}

However, that obviously takes forever! So I am looking for a faster solution. For similar problems I found a way to have cost factors in lists, subset the df by crop and do it with "within". But this time I just don't know how to do it.

test <- df[sample(nrow(df), 500), ]

test$psm <- NA
cropList<-by(test,test$crop, subset)

psmww <- c( "1"=120, "2"=103,"3"=76,"4"=60,"5"=60)
psmwr <- c( "1"=130, "2"=104,"3"=78,"4"=61,"5"=60)
psmwg <- c( "1"=140, "2"=105,"3"=79,"4"=60,"5"=60)
psmwraps <- c( "1"=150, "2"=105,"3"=76,"4"=60,"5"=60)
psmpot <- c( "1"=160, "2"=107,"3"=71,"4"=60,"5"=60)
psmzr <- c( "1"=170, "2"=108,"3"=72,"4"=60,"5"=60)
psmsm <- c( "1"=120, "2"=109,"3"=74,"4"=60,"5"=60)


for (i in 1:nrow(cropList)) {
  lapply(cropList[i], function(x) {
    for (i in 1:5) {
      x <- within(x, psm[lbg==psm[??] ]<- psm??[[cropList[i]]]) 
    }} )
}

I am thankful for any suggestion!

Seems like you might want to merge/join the data frames? `merge(df, costs, all.x = TRUE)`? See the [related R-FAQ here](https://stackoverflow.com/q/1299871/903061). Hard to tell without sample data... — Gregor Thomas, Feb 24 '22 at 15:51

for loop over big dataframe with 2 conditions

0 Answers0