0

I have two data frames - one contains a bunch of postcodes and surveys allocated to them (df1), the other contains all possible postcodes and the zones within which these are contained (df2). I essentially need to remove the postcodes, and allocate each survey to a zone (using the postcodes to cross reference). Currently I estimate my program will take 5hrs. how can I speed this up?

for (i in 1:nrows(df1)) {
    index <- which(df2$postcodes == toString(df1$postcodes[i])
    if (length(index)) {
        df1$zone <- toString(df2[index])
    } else {
        df1$zone <- 'UNMATCHED'
    }
} 

Currently, I've found that running this for 100 postcodes takes about 6 seconds.

I have tried a bunch of things for the past couple of hours with little to no progress, so any help would be much appreciated!

Ian Kemp
  • 28,293
  • 19
  • 112
  • 138
AMD
  • 1
  • 1
  • 1
    Are you familiar with any package such as dplyr or data.table or prefer base R? Also would be great if you posted examples of `df1` and `df2` in a reproducible format (can be copied into a R session) – s_baldur May 23 '19 at 11:19
  • 1
    sounds like a pretty default use of a (left) join... read: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – Wimpel May 23 '19 at 11:53

1 Answers1

1

This sounds like an inner join, which can be achieved with dplyr. And you can drop the postcodes column after. Here is a nice cheat sheet: http://stat545.com/bit001_dplyr-cheatsheet.html

inner_join(x, y): Return all rows from x where there are matching values in y, and all columns from x and y. If there are multiple matches between x and y, all combination of the matches are returned. This is a mutating join.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Erik
  • 46
  • 3