0

I have one df1 that has 800 rows and the other df2 that has 9 million rows. Both have latitude and longitude and the df2 has some more columns that I need to add to df1 based on shortest distance as lat and lon do not mach exactly in both dataframes. I used goe_join from Fuzzyjoin package but get errors.

Summary of df1:

summary(df1)
          lat             lon           
 Min.   :25.39   Min.   :-124.62   
 1st Qu.:36.20   1st Qu.:-104.94    
 Median :40.63   Median : -84.15   
 Mean   :39.32   Mean   : -89.44    
 3rd Qu.:42.08   3rd Qu.: -73.97    
 Max.   :48.73   Max.   : -67.27  

Summary of df2:

summary(df2)
lon               lat                    x1                 x2                x3 
 Min.   :-124.73   Min.   :24.98   Min.   :-2230806   Min.   :-1569579   Min.   :     0.0  
 1st Qu.:-110.13   1st Qu.:34.78   1st Qu.:-1126720   1st Qu.: -508033   1st Qu.:   670.8  
 Median : -99.17   Median :39.06   Median : -263314   Median :  -15116   Median :  1507.5  
 Mean   : -99.17   Mean   :38.97   Mean   : -239487   Mean   :  -30086   Mean   :  2856.3  
 3rd Qu.: -88.94   3rd Qu.:43.25   3rd Qu.:  578810   3rd Qu.:  466600   3rd Qu.:  3354.7  
 Max.   : -66.97   Max.   :49.38   Max.   : 2122143   Max.   : 1270878   Max.   :395131.9  

Here is my code:

merged.dfs <- geo_join(df1, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1) 

Here is the error I get:

Joining by: c("lat", "lon") 

Error in fuzzy_join(x, y, multi_by = by, multi_match_fun = match_fun, : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522

I appreciate your help!

Heresh
  • 1
  • 2
  • I think *"long vectors not supported"* is both clear and unavoidable. Break your 9M row frame into smaller frames and repeat all 800 across your fragmented-9M frame. You'll need to keep some measure of distance in the merged frame so that you can take all of the appended columns (for each original point) and determine which to keep. – r2evans May 23 '19 at 01:17
  • BTW: if you want actual coding help, you'll need to provide sample data. Giving the summary is informative statistically but not programming-wise: I cannot create sample data that matches your data based on that alone. Consider `dput(head(x))` for each. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans May 23 '19 at 01:18

0 Answers0