hey wonderful R community, I have a relatively big dataframe with over 10 million rows (df). It includes 7 different crops that are grown on 5 different soil types. I am trying to add the costs for fertilizer (psm) in an extra column depending on the soil and crop. I have psm in an extra table with all the other costs
I found one solution that works:
df$psm <- NA
for (i in 1:nrow(df)) {
for (j in 1:nrow(costs)) {
if (df$crop[i] == costs$crop[j] && !is.na(df$lbg_soil[i]) && df$lbg_soil[i] == costs$lbg[j] && costs$factor[j] == "PSM") {
df$psm[i] <- costs$cost[j]
}
}
}
However, that obviously takes forever! So I am looking for a faster solution. For similar problems I found a way to have cost factors in lists, subset the df by crop and do it with "within". But this time I just don't know how to do it.
test <- df[sample(nrow(df), 500), ]
test$psm <- NA
cropList<-by(test,test$crop, subset)
psmww <- c( "1"=120, "2"=103,"3"=76,"4"=60,"5"=60)
psmwr <- c( "1"=130, "2"=104,"3"=78,"4"=61,"5"=60)
psmwg <- c( "1"=140, "2"=105,"3"=79,"4"=60,"5"=60)
psmwraps <- c( "1"=150, "2"=105,"3"=76,"4"=60,"5"=60)
psmpot <- c( "1"=160, "2"=107,"3"=71,"4"=60,"5"=60)
psmzr <- c( "1"=170, "2"=108,"3"=72,"4"=60,"5"=60)
psmsm <- c( "1"=120, "2"=109,"3"=74,"4"=60,"5"=60)
for (i in 1:nrow(cropList)) {
lapply(cropList[i], function(x) {
for (i in 1:5) {
x <- within(x, psm[lbg==psm[??] ]<- psm??[[cropList[i]]])
}} )
}
I am thankful for any suggestion!