1

I have two dataframes, df1 (2000 rows) and df2 (100,000 rows). And I want to:

  • Read lat/long combination from df1
  • Match with lat/long from df2
  • Read the corresponding "m" value of df2
  • Add it to df1

How can I go about it?

df1
lat      long       val
33.29083 -109.19556 12.5000
32.96583 -109.30972 11.5000
33.04944 -109.29528 1.7500
33.06444 -109.44167 1.7500
33.61944 -110.92083 0.2500
33.98000 -111.30278 3.7500
33.79806 -110.49917 1.00008
38.37972 -119.44917 3.2500

And

df2<-structure(list(lat = c(33.29, 32.96, 48.15, 48.1, 48.18, 48.14
    ), lon = c(-109.19, -109.31, -124.69, -124.69, -124.68, -124.68
    ), m = c(0.4713, 0.8998, 0.4891, 0.8418, 0.7998, 0.5292), flagar = c(0L, 
    1L, 0L, 1L, 1L, 0L), flagk = c(0L, 0L, 0L, 0L, 0L, 0L), flagsi = c(0L, 
    0L, 0L, 0L, 0L, 0L), flags2o = c(0L, 0L, 0L, 0L, 0L, 0L), flagap = c(0L, 
    0L, 0L, 0L, 0L, 0L), flagmt = c(0L, 0L, 0L, 0L, 0L, 0L), flagcn = c(0L, 
    0L, 0L, 0L, 0L, 0L), flagkf = c(1L, 1L, 1L, 1L, 1L, 1L), flagrd = c(1L, 
    1L, 1L, 1L, 1L, 1L), flagrv = c(2L, 2L, 2L, 2L, 2L, 2L), flagpt = c(1L, 
    1L, 1L, 1L, 1L, 1L)), .Names = c("lat", "lon", "m", "flagar", 
    "flagk", "flagsi", "flags2o", "flagap", "flagmt", "flagcn", "flagkf", 
    "flagrd", "flagrv", "flagpt"), row.names = c(NA, 6L), class = "data.frame")
Jaap
  • 81,064
  • 34
  • 182
  • 193
maximusdooku
  • 5,242
  • 10
  • 54
  • 94
  • 1
    There are no identical lat/lon point in the two dataframes. How do you want to match? On exact location will give no matches with the example data. Do you want to match on shortest distance? – Jaap Oct 13 '15 at 18:10
  • Sorry, the actual datasets have matching points. These are subsets, so....I'll edit it. – maximusdooku Oct 13 '15 at 18:14
  • Added now. Matching can be upto 2 decimal points – maximusdooku Oct 13 '15 at 18:15

1 Answers1

2

Using the data.table package you could do it as follows:

library(data.table)
# convert the dataframes to datatables (which are an extended form of a dataframe)
# and round the lat & lon variables to one decimal
setDT(df1)[, `:=` (lat = round(lat,1), lon = round(lon,1))]
setDT(df2)[, `:=` (lat = round(lat,1), lon = round(lon,1))]

# joining the m column of df2 to df1 based on exact matches of the rounded lat/lon values
# m = i.m transfers the matching values to the df1 dataframe
df1[df2, m := i.m, on=c("lat","lon")]

this gives:

> df1
    lat    lon      val      m
1: 33.3 -109.2 12.50000 0.4713
2: 33.0 -109.3 11.50000 0.8998
3: 33.0 -109.3  1.75000 0.8998
4: 33.1 -109.4  1.75000     NA
5: 33.6 -110.9  0.25000     NA
6: 34.0 -111.3  3.75000     NA
7: 33.8 -110.5  1.00008     NA
8: 38.4 -119.4  3.25000     NA

If you want to match on shortest distance, see this answer for an example.

Community
  • 1
  • 1
Jaap
  • 81,064
  • 34
  • 182
  • 193