R: Speed up for loop containing which

Question

I have two data frames - one contains a bunch of postcodes and surveys allocated to them (df1), the other contains all possible postcodes and the zones within which these are contained (df2). I essentially need to remove the postcodes, and allocate each survey to a zone (using the postcodes to cross reference). Currently I estimate my program will take 5hrs. how can I speed this up?

for (i in 1:nrows(df1)) {
    index <- which(df2$postcodes == toString(df1$postcodes[i])
    if (length(index)) {
        df1$zone <- toString(df2[index])
    } else {
        df1$zone <- 'UNMATCHED'
    }
}

Currently, I've found that running this for 100 postcodes takes about 6 seconds.

I have tried a bunch of things for the past couple of hours with little to no progress, so any help would be much appreciated!

Are you familiar with any package such as dplyr or data.table or prefer base R? Also would be great if you posted examples of `df1` and `df2` in a reproducible format (can be copied into a R session) — s_baldur, May 23 '19 at 11:19
sounds like a pretty default use of a (left) join... read: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right — Wimpel, May 23 '19 at 11:53

score 1 · Answer 1 · edited May 23 '19 at 16:06

1

This sounds like an inner join, which can be achieved with dplyr. And you can drop the postcodes column after. Here is a nice cheat sheet: http://stat545.com/bit001_dplyr-cheatsheet.html

inner_join(x, y): Return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. This is a mutating join.

edited May 23 '19 at 16:06

marc_s

732,580
175
1,330
1,459

answered May 23 '19 at 12:01

Erik

46
3

The link is a very smooth intro to joins – Kots May 23 '19 at 12:35

R: Speed up for loop containing which

1 Answers1