2

I'm working with some geographic data, looking at the boundaries of the neighborhoods of Boston, and trying to identify which neighborhoods certain building permits were given for.

So far I've:

  1. Read in the shapefile and converted it to a dataframe of latitudes and longitudes using readshapePoly from the maptools package.
  2. Associated a name with each of those neighborhood boundaries - Brighton, Chinatown, etc.
           long      lat order  hole piece id group                    name
1     -71.12593 42.27201     1 FALSE     1  0   0.1              Roslindale
2     -71.12575 42.27235     2 FALSE     1  0   0.1              Roslindale
3     -71.12566 42.27248     3 FALSE     1  0   0.1              Roslindale
4     -71.12555 42.27258     4 FALSE     1  0   0.1              Roslindale
5     -71.12573 42.27249     5 FALSE     1  0   0.1              Roslindale
6     -71.12638 42.27217     6 FALSE     1  0   0.1              Roslindale
7     -71.12652 42.27210     7 FALSE     1  0   0.1              Roslindale
8     -71.12660 42.27218     8 FALSE     1  0   0.1              Roslindale
9     -71.12666 42.27224     9 FALSE     1  0   0.1              Roslindale
10    -71.12691 42.27210    10 FALSE     1  0   0.1              Roslindale
11    -71.12726 42.27200    11 FALSE     1  0   0.1              Roslindale
12    -71.12740 42.27196    12 FALSE     1  0   0.1              Roslindale

....

  1. Generated a long list of latitudes and longitudes for all of my building permits - This was not originally a shapefile, meaning I don't know if I can overlay the two sets using "sf"
Latitude Longitude
      <dbl>     <dbl>
 1     42.3     -71.1
 2      0         0  
 3     42.4     -71.1
 4     42.3     -71.1
 5     42.4     -71.1
 6     42.4     -71.1
 7     42.4     -71.1
 8     42.4     -71.1
 9      0         0  
10     42.4     -71.1

My problem is that I have all these building permits, but they don't have the associated neighborhood, which is what I want to study. Conceptually, I know I want to do something like this:

  1. Identify which polygon each coordinate is in using my polygons from step 1 and 2
  2. Use the identification from the first step to attach the polygon "name" to the coordinate neighborhood.
OfTheAzureSky
  • 83
  • 1
  • 8
  • I had a similar problem with media markets maps once. What I did was: 1 - Generate a "center" for each geographical area. 2 - Use K-nearest neighbors to find to which "center" my target data points were. A possible drawback to this lies that some points near the boundaries were misclassified, so you might want to take into account how precise do you want to be. – Henry Cyranka Feb 24 '19 at 23:09
  • My fear is that with some of these odd shapes (islands and the like) there would be some mis-matched points. Perhaps your method actually takes care of that though... – OfTheAzureSky Feb 24 '19 at 23:21
  • 1
    1. Read the bondary shapefile using `library(sf)` as per @AndrewRoyal 's answer. 2. Convert your data.frame of lon & lat points into a `sf` object using `sf <- sf::st_as_sf( df, coords = c("lon", "lat") )` (replace "lon" & "lat" with the column names). 3. Use `sf::st_join(x, y, left = FALSE)` to do an 'inner join' to give you the points inside polygons. – SymbolixAU Feb 24 '19 at 23:54
  • 1
    I've used `over` from the `sp` package to answer a similar problem before, with one shapefile & one data frame. You can reference the steps [here](https://stackoverflow.com/a/45956978/8449629). – Z.Lin Feb 25 '19 at 02:13

1 Answers1

1

This can be done using the sf package. Assuming you have points and polygons stored as shapefiles you can do the following:

library('sf')

polygonSF <- read_sf(dsn = 'polygonShapeFile')
pointSF <- read_sf(dsn = 'pointShapeFile')

st_intersection(pointSF, polygonSF)

If they are not already shape files there are a few intermediate steps.

For example, suppose the points (the permits in your example) are stored in a dataframe (pointDF) with latitude and longitude columns. You need to transform the dataframe into a shapefile and then tell R to use the same coordinate reference system (CRS) for your points as you are using for your polygon boundaries:

pointSF <- st_as_sf(x = pointDF,                         
                    coords = c("longitude", "latitude"),
                    crs = "+init=epsg:4326")
pointSF <- st_transform(pointSF, crs = st_crs(poloygonSF))
Andrew Royal
  • 336
  • 1
  • 5
  • So, I do have a shapefile for the boundaries of the neighborhoods. I do not have a shapefile for the points that identify the building permit locations. I have a data frame with two columns of latitude and longitude of each building permit, and I did extract the shapefile to generate a series of latitude and longitude boundaries using readShapePoly from the maptools package – OfTheAzureSky Feb 24 '19 at 23:39
  • I've updated my answer to respond to your question. If this correctly responds to your question, please indicate that you've accepted my answer :) – Andrew Royal Feb 25 '19 at 13:38
  • Yes, this worked out! – OfTheAzureSky Feb 27 '19 at 16:37