2

I have a two data.frames with ranges of lat/long. One of the ranges is df1 includes all the values I need to keep. The second range, df2 are a wide variety of ranges. See maps below.

I'm trying to subset df2 with only those lat/long that are located within the range of df1. However, I'm using min() and max() functions to try and keep those ranges, but it's not working correctly. By using those two functions, it's only account for the linear relationship between the lat and long, and not the closest proximity. See code under maps.

These are the ranges I need to keep within the state of Texas df1: enter image description here

And these are all the ranges I have available df2. enter image description here

dput df1:

df1 <- structure(list(lat = c(29.6666666666667, 29.4166666666667, 27.9166666666667, 
31.25, 30.7083333333333, 32.9166666666667, 29.7916666666667, 
32.1666666666667, 30.5416666666667, 28.6666666666667, 31.25, 
29.875, 30.6666666666667, 31, 29.7916666666667, 33, 36.4583333333333, 
28.5833333333333, 29.375, 32.5, 32.375, 29.3333333333333, 29.1666666666667, 
30.5416666666667, 33.0833333333333, 31.9583333333333, 31.375, 
32.5416666666667, 31.25, 32.125, 29.875, 30.2083333333333, 30.125, 
35.9166666666667, 34.25, 31.5416666666667, 32.375, 31.2083333333333, 
27.2916666666667, 33.2916666666667, 33.9583333333333, 29.9166666666667, 
28.75, 34.625, 30.25, 34.2083333333333, 32.375, 34.125, 31.7916666666667, 
31.625, 32.2916666666667, 29.3333333333333, 30.7916666666667, 
30.2916666666667, 31.9583333333333, 29.2083333333333, 31.6666666666667, 
30.7916666666667, 31.25, 31.875, 32.2083333333333, 30.5416666666667, 
35.3333333333333, 29.0833333333333, 29.7916666666667, 30.125, 
34.0416666666667, 30.125, 34.0833333333333, 27.25, 32.6666666666667, 
30.0416666666667, 32.5416666666667, 31.5, 33.8333333333333, 29.5, 
25.9583333333333, 26.4583333333333, 30.5, 26.5833333333333, 31.75, 
34.9583333333333, 31.0833333333333, 30.75, 31.9166666666667, 
32.2083333333333, 30.6666666666667, 29.5833333333333, 33.5, 33.5833333333333, 
30.625, 30, 34.2916666666667, 31.875, 30.7083333333333, 33.0416666666667, 
32.625, 32.7916666666667, 29.1666666666667, 33.5), long = c(-100.208333332859, 
-96.4583333328374, -97.3749999995093, -103.95833333288, -98.8333333328509, 
-102.916666666207, -103.41666666621, -99.9999999995242, -94.4166666661591, 
-95.9999999995014, -102.74999999954, -100.583333332861, -99.9999999995242, 
-101.458333332866, -104.458333332883, -96.3333333328367, -102.333333332871, 
-99.208333332853, -94.7499999994943, -97.49999999951, -98.7083333328502, 
-100.166666666192, -99.7083333328559, -97.0833333328409, -98.4166666661819, 
-97.7499999995114, -93.99999999949, -101.583333332867, -104.374999999549, 
-97.9583333328459, -100.041666666191, -100.45833333286, -95.2083333328303, 
-101.208333332864, -99.7499999995228, -100.374999999526, -100.916666666196, 
-100.499999999527, -99.3333333328537, -95.2916666661641, -102.249999999537, 
-98.333333332848, -97.5416666661769, -102.083333332869, -102.083333332869, 
-99.1666666661861, -102.083333332869, -99.5833333328552, -96.9166666661733, 
-103.749999999546, -97.1666666661747, -95.7916666661669, -102.541666666205, 
-94.9166666661619, -100.249999999526, -95.3749999994979, -101.208333332864, 
-98.1249999995135, -94.3749999994922, -97.3749999995093, -97.7916666661783, 
-98.333333332848, -100.249999999526, -97.7499999995114, -97.49999999951, 
-101.333333332865, -98.791666666184, -95.2083333328303, -99.6249999995221, 
-97.7499999995114, -101.874999999535, -94.4166666661591, -98.1666666661804, 
-100.124999999525, -96.8333333328395, -96.3749999995036, -97.2916666661754, 
-98.9999999995185, -102.624999999539, -97.5416666661769, -104.541666666217, 
-100.666666666195, -97.8333333328452, -94.8333333328281, -99.2499999995199, 
-95.9166666661676, -94.4166666661591, -94.5416666661598, -98.1249999995135, 
-95.2916666661641, -101.458333332866, -104.166666666215, -102.541666666205, 
-95.5416666661655, -93.7083333328217, -96.2083333328359, -95.7083333328331, 
-95.2499999994971, -96.0833333328352, -100.124999999525)), .Names = c("lat", 
"long"), row.names = c(85680L, 1319359L, 1830830L, 1304489L, 
1072503L, 462516L, 678461L, 257507L, 1909316L, 1599092L, 551980L, 
948368L, 1707870L, 1507833L, 1190987L, 681396L, 1321319L, 133499L, 
1213001L, 18800L, 1060501L, 1295647L, 334268L, 399477L, 1030612L, 
1390228L, 255017L, 1652752L, 795949L, 761335L, 310677L, 985728L, 
887656L, 242521L, 1514901L, 1346114L, 962315L, 1908903L, 1911307L, 
124567L, 58313L, 1394404L, 763303L, 1843111L, 857880L, 298692L, 
1373653L, 914743L, 166059L, 1754481L, 1219252L, 312112L, 852388L, 
396677L, 906070L, 152644L, 1007020L, 1317142L, 863194L, 1141341L, 
706510L, 1467240L, 35951L, 1482008L, 979650L, 409405L, 1236400L, 
962680L, 837083L, 94376L, 533974L, 1631418L, 251492L, 1383646L, 
726181L, 1356856L, 1655225L, 1907020L, 1902953L, 786466L, 1658482L, 
1585289L, 1352146L, 1865639L, 268501L, 1628615L, 671385L, 642906L, 
1243516L, 441432L, 645239L, 253095L, 733426L, 562744L, 250656L, 
1892959L, 372300L, 374497L, 598520L, 1483079L), class = "data.frame")

dput df2:

df2 <- structure(list(lat = c(-21.28, 41.3686, 33.9107, 25.12, 30.8275, 
29.4886, 34.1586, 45.15, 16.083, 40.0478, -20.5525, 42.35, 35.615, 
39.6825, -34.7122, 64.45, 29.5853, -13.2287, 38.8097, -31.3292, 
43.1, 31.4221, -30.4291, 38.5642, 36.0617, 37.6386, 6.0833, -21.5119, 
43.0481, 46.3333, 44.9992, 47.0653, 35.5678, 40.3038, 46.8352, 
26.3559, 21.0867, -41.1964, 45.3, 34.3328, 44.6667, 32.6089, 
58.4167, 35.9755, -30.4869, -28.9667, 40.3004, -38.35, 41.4449, 
33.6012, 22.23, 35.35, 45.0592, 22.0167, 38.25, -26.6667, 42.425, 
-22.97, 35.3692, 37.2331, 43.117, -30, 40.133, 37.8747, 32.2531, 
42.1833, 46.6907, 33.1353, 59.9167, 34.6737, 58.82, 40.0732, 
40.3691, -8.12, -5.58, 35.6558, 61.7133, 49.3, -10.45, 43.0048, 
33.4139, 39.6, 58.3622, 60.68, 34.683, -9.76, 34.3648, 61.07, 
-8.82, 45.7034, 41.3262, 46.5689, 46.875, -25.7683, 45.6, 26.6211, 
51.9, 63, 40.867, 40.221), long = c(-50.35, -96.095, -78.3025, 
82.9, -100.1103, -81.2402, -78.8603, -115.3167, 120.35, -105.2672, 
144.0367, -84.3167, -87.0353, -86.258, 138.9469, 17.0797, -98.7011, 
131.1355, -90.0028, 116.0811, -82, -100.5003, 150.5298, -123.1617, 
-98.59, -84.1098, 171.7333, 144.6339, -82.9239, -61.1, -101.2314, 
-91.6761, -82.8394, -103.1114, -96.7914, -80.2238, -157.0225, 
145.9997, -74.3, -84.4703, -71.2167, -85.0756, -130.0333, -79.3095, 
118.1258, 152.8167, -74.3311, 146.2, -86.0116, -79.0143, 87.8, 
-78.0333, -83.9011, -159.45, 140.35, 146.6167, -103.7358, -47.08, 
-117.6525, -119.5047, -88.4839, 141.6167, -99.8238, -111.9731, 
-107.7531, -76.5167, -120.4949, -107.2317, -113.9333, -83.0005, 
17.3297, -99.6668, -78.4174, -35.18, -45.88, -88.0064, 6.6164, 
-99.45, -45.15, -79.2657, -84.5958, 45.5, -134.5711, 16.35, 131.783, 
-66.61, -86.2361, 18.68, -36.05, -122.6106, -82.492, -96.0886, 
-111.1633, 152.525, -107.45, -80.2021, 127.7, -156.0667, 45.15, 
-85.1036)), .Names = c("lat", "long"), row.names = c(20755L, 
82475L, 60665L, 37831L, 89680L, 52520L, 85455L, 76368L, 40639L, 
49579L, 5712L, 80268L, 67350L, 55970L, 4897L, 44788L, 67992L, 
3294L, 76550L, 1484L, 29090L, 69644L, 10326L, 73682L, 86352L, 
57349L, 40579L, 6809L, 79978L, 31106L, 88426L, 59085L, 61056L, 
51946L, 61720L, 52929L, 94259L, 16494L, 29857L, 75694L, 83389L, 
75916L, 25284L, 60523L, 2177L, 11182L, 62281L, 15180L, 55504L, 
66088L, 38023L, 85213L, 80153L, 94573L, 38225L, 9135L, 95606L, 
20939L, 74353L, 95070L, 71753L, 9330L, 46103L, 91127L, 83769L, 
84903L, 71489L, 84234L, 26978L, 66221L, 44019L, 46099L, 65381L, 
22766L, 21947L, 89020L, 40132L, 27797L, 22964L, 31892L, 75937L, 
214L, 94704L, 44385L, 38284L, 18206L, 47344L, 44434L, 22829L, 
70909L, 64053L, 80756L, 82150L, 8132L, 81874L, 52925L, 41594L, 
93980L, 186L, 56140L), class = "data.frame")

Code:

library(dplyr)
db <- filter(df2, lat >= min(df1$lat) & lat <= max(df1$lat) & 
                                                            long >= min(df1$long) & long <= max(df1$long))

And here are the results plotted. You can see that the point in Oklahoma should not be included because it is outside the range, but because the latitude is linearly related to the points in Texas it is included. I'm not sure how to approach this problem any further and would appreciate any help.

My question is how would I subset df2 with the ranges in df1 using lat/long?

enter image description here

Vedda
  • 7,066
  • 6
  • 42
  • 77
  • Is it fair to say that `df1` will always be Texas-only? If that's the case, have you considered geo-coding the coordinates to generate a state column? If you do this you could simply filter `df2` for `state == 'TX'`. – JasonAizkalns Nov 04 '15 at 19:05
  • @JasonAizkalns Yes I have, but I was hoping to figure out how to find the ranges because then I can subset zip/fips codes and determine ranges within those lat/long. Therefore, I can restrict not only by state, but by lat/long ranges. – Vedda Nov 04 '15 at 19:09

1 Answers1

4

you can use the library maps to see which state a point is in, and then subset out those that do not match.

library(maps)
df2$state <- map.where("state", df2$long, df2$lat)
df2[df2$state == "texas" & !is.na(df2$state),]

          lat      long state
89680 30.8275 -100.1103 texas
67992 29.5853  -98.7011 texas
69644 31.4221 -100.5003 texas

EDIT: If you want to do it using the areas that aren't just the state, we can create a spatial polygon, and then see if the points are in there. I'm using modified code from this answer.

First we create a "convex hull" and convert it to a spatial polygon:

library("sp")
library("rgdal")

ch <- chull(df1$long, df1$lat)
coords <- df1[c(ch, ch[1]), ]
sp_poly <- SpatialPolygons(list(Polygons(list(Polygon(coords)), ID = 1)))
plot(df1$lat, df1$long)
lines(coords, col="red")

enter image description here

Then we find out which points are inside it:

coordinates(df2) <- ~lat + long
coords <- over(df2, sp_poly)
df2[coords == 1 & !is.na(coords),]

SpatialPoints:
          lat      long
89680 30.8275 -100.1103
67992 29.5853  -98.7011
69644 31.4221 -100.5003
jeremycg
  • 24,657
  • 5
  • 63
  • 74