Identify points from a second data frame within specified distance in R

Question

I have a list of 10 archaeological sites and a list of finds found fieldwalking, both with easting and northing coordinate data. For each of my sites, I want to select finds that were found within a 20km radius of them. I have looked for solutions to identify points within a specified distance, but they all seem to assume that you're using a single data frame- is there a way to do this? My thinking was if not, I should join the data frames and then subset this at the end to only show the artefacts?

The site data is formatted as:

SiteID    Easting   Northing
HtsOJM-IC3 464870 106560
HtsCBT-BI1 438430 139000

The finds data as:

id objecttype easting northing
1005083 BROOCH 458805 128421
1005080 BROOCH 458805 128421

At some point you will have to compare every `site` to every `find`. It sounds like your data is small enough that a full join / calculate distance / filter approach is practical - that's probably the easiest way to go. — Gregor Thomas, Oct 06 '20 at 13:39
You could also take inspiration from [Calculate Distance using Latitude and Longitude data in Different Data frames of different lengths with loop](https://stackoverflow.com/q/63349663/903061) — Gregor Thomas, Oct 06 '20 at 13:41

score 1 · Accepted Answer · answered Oct 06 '20 at 14:54

You can use a for-loop to loop through each site, adding a column to the finds dataframe with true/false (or true/NA as I've used here) for each find.

I've used a cut-off of +/- 10000 in the co-ordinates, and changed the co-ordinates in your example of the finds table so each is in range of one site so you can see how it works.

library(tidyverse)

Data:

sites <- tibble(SiteID = c("HtsOJM-IC3", "HtsCBT-BI1"), 
                Easting = c(464870, 438430), 
                Northing = c(106560, 139000))

finds <- tibble(id = c(1005083, 1005080), 
                objecttype = c("BROOCH", "BROOCH"), 
                easting = c(470870, 433430), 
                northing = c(103560, 142000))

Set cut_off to the max difference between co-ordinates you want to be marked as true

cut_off <- 10000

For each row in sites, a new column is added to finds with a value of True if both the easting and northing co-ordinates are in range of the site ± the value of cut_off.

for (i in 1:nrow(sites)) {
  site_ID <- sites[[i,"SiteID"]]
  site_easting <- sites[[i, "Easting"]]
  site_northing <- sites[[i, "Northing"]]
  
  finds <- finds %>% 
    mutate(!! site_ID := if_else((between(easting, site_easting - cut_off, site_easting + cut_off) &
                                between(northing, site_northing - cut_off, site_northing + cut_off)),
                              T, NA))
}

Output:

# A tibble: 2 x 6
       id objecttype easting northing `HtsOJM-IC3` `HtsCBT-BI1`
    <dbl> <chr>        <dbl>    <dbl> <lgl>        <lgl>       
1 1005083 BROOCH      470870   103560 TRUE         NA          
2 1005080 BROOCH      433430   142000 NA           TRUE

If you want to convert the separate sites columns into a single column, use pivot_wider():

finds %>% 
  pivot_longer(cols = starts_with("Hts"), names_to = "SiteID", values_drop_na = T) %>% 
  select(-value)

Output, note that if any find is within range of more than one site they'll now have multiple rows, 1 per site:

# A tibble: 2 x 5
       id objecttype easting northing SiteID    
    <dbl> <chr>        <dbl>    <dbl> <chr>     
1 1005083 BROOCH      470870   103560 HtsOJM-IC3
2 1005080 BROOCH      433430   142000 HtsCBT-BI1

Identify points from a second data frame within specified distance in R

1 Answers1