I'm coming from a background in Python and C++, and R seems to use magic that I don't understand. I was hoping someone would be able to give me some insight into how it works.
I was tasked with applying an algorithm to each row in a tibble of about 3,400,000 data points, and coming from C++, I thought to iterate over the table and calculate it manually and entering it into the tibble as such:
add_elev <- function(all, elev){
row <- 1
while(row < nrow(all)){
adder <- filter(elev, lake_id == all[row, "lake_id"][[1]])
curr_id <- all[row, "lake_id"][[1]]
while(all[row, "lake_id"][[1]] == curr_id){
all[row, "elevation"] <- adder[1, "elevation"][[1]]
row <- row + 1
if (row > nrow(all)){
break
}
if (all[row, "lake_id"][[1]] != curr_id){
break
}
}
if (row > nrow(all)){
break
}
}
return(all)
}
The function works, but it was estimated to take about 9 hours. After looking around in some reference books, I found that I could accomplish the same thing by simply using "all <- left_join(all, elevation, by = "lake_id")". This finished up in less than a second, and seemingly all 3,400,000 data points were correct. The only way I can think of doing this was through iteration, so I have no idea how that small line of code finished up so quickly. Can someone explain to me the magic of these tibbles?