Can someone explain the underlying methods of R?

Question

I'm coming from a background in Python and C++, and R seems to use magic that I don't understand. I was hoping someone would be able to give me some insight into how it works.

I was tasked with applying an algorithm to each row in a tibble of about 3,400,000 data points, and coming from C++, I thought to iterate over the table and calculate it manually and entering it into the tibble as such:

add_elev <- function(all, elev){
  row <- 1
  while(row < nrow(all)){
    adder <- filter(elev, lake_id == all[row, "lake_id"][[1]])
    curr_id <- all[row, "lake_id"][[1]]
    while(all[row, "lake_id"][[1]] == curr_id){

      all[row, "elevation"] <- adder[1, "elevation"][[1]]
      row <- row + 1

      if (row > nrow(all)){
        break
      }
      if (all[row, "lake_id"][[1]] != curr_id){
        break
      }

    }

    if (row > nrow(all)){
      break
    }

  }
  return(all)
}

The function works, but it was estimated to take about 9 hours. After looking around in some reference books, I found that I could accomplish the same thing by simply using "all <- left_join(all, elevation, by = "lake_id")". This finished up in less than a second, and seemingly all 3,400,000 data points were correct. The only way I can think of doing this was through iteration, so I have no idea how that small line of code finished up so quickly. Can someone explain to me the magic of these tibbles?

There are a lot of different things going on here, meaning the question is probably too broad. But at a basic level, if each lake_id is present multiple times in `all`, you can save a lot of time by just looking it up once in `elev`, and applying it to every relevant row in `all`. — Marius, Jan 11 '19 at 02:48
If you can make your example [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), you'll be able to get much better answers. — Chase, Jan 11 '19 at 03:32

score 1 · Answer 1 · answered Jan 11 '19 at 03:50

R's magic is a vectorized approach when working on variables. It is much faster than writing native looping structures that perform the same thing.

Vectorization sometimes uses recycling to ensure data structures have the same size in order to perform the operation faster. Element assignments (like in your example) tend to require copies of variables, which slows down processing.

Can someone explain the underlying methods of R?

1 Answers1