I think I have an answer, if I have interpreted the problem correctly.
From what I got from the question, you want to know how many stores are within 1000 and 2000 m of each school, but stores are only counted towards the school they are closest to - is this right?
Minimal code setup, by saving your sample data as an .xlsx
file in the working directory:
library(readxl)
library(tidyverse)
library(sf)
read_xlsx('Schools and Stores.xlsx', sheet = 1) %>%
st_as_sf(., coords = c("long", "lat"), crs = "epsg:4326") %>%
st_transform(crs = "ESRI:102003") %>%
{. ->> school.sf.utm}
read_xlsx('Schools and Stores.xlsx', sheet = 2) %>%
st_as_sf(., coords = c("XCoord", "YCoord"), crs = "ESRI:102696") %>%
st_transform(crs = "ESRI:102003") %>%
{. ->> store.sf.utm}
Firstly, to reduce the number of stores in the dataset we keep only stores within a 2 km buffer of all schools (This might have been what you did by using st_union()
after st_buffer()
). This reduces the number of stores from 2603 to 191.
# step 1 - keep only stores within a 2km buffer of all schools, to reduce number of stores to work with
stores.sf.utm %>%
filter(
st_intersects(stores.sf.utm, school.sf.utm %>% st_buffer(2000), sparse = FALSE)
) %>%
rename(
geometry_stores = geometry
) %>%
{. ->> stores_2000}
stores_2000
# Simple feature collection with 191 features and 0 fields
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 496820.2 ymin: 138115.8 xmax: 500484.2 ymax: 141987.8
# Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
# # A tibble: 191 x 1
# geometry_stores
# <POINT [m]>
# 1 (496820.2 139441)
# 2 (496848.1 140725.7)
# 3 (496987.8 138959.5)
# 4 (497052.2 139815.4)
# 5 (497030 140286.7)
# 6 (497122.5 138900.1)
# 7 (497033.2 140646.1)
# 8 (497099.8 140279.6)
# 9 (497199.7 138687.5)
# 10 (497154.4 139805.9)
# # ... with 181 more rows
Next, we generate all potential combinations of schools and remaining stores. I assign a store_id
so we can tell which store is which (without using it's geometry
).
# generate all schools~stores combos
stores_2000 %>%
mutate(
store_id = row_number(),
schools = list(school.sf.utm)
) %>%
unnest(cols = c('schools')) %>%
rename(
geometry_school = geometry
) %>%
{. ->> all_combos}
all_combos
# Simple feature collection with 3438 features and 2 fields
# Active geometry column: geometry_stores
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 496820.2 ymin: 138115.8 xmax: 500484.2 ymax: 141987.8
# Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
# # A tibble: 3,438 x 4
# geometry_stores store_id School geometry_school
# <POINT [m]> <int> <chr> <POINT [m]>
# 1 (496820.2 139441) 1 AcademyOf Envt Sci/math Middle School (498610.1 140067.7)
# 2 (496820.2 139441) 1 Collegiate School Of Med/bio (496797.7 140597.6)
# 3 (496820.2 139441) 1 Dewey Sch.-internat'l. Studies (499626.5 139130.3)
# 4 (496820.2 139441) 1 Eagle Fox Park (498015.9 139324.1)
# 5 (496820.2 139441) 1 Education Therap Support At Madison (476270.1 131682.7)
# 6 (496820.2 139441) 1 Hodgen Elementary School (497853.4 140290.1)
# 7 (496820.2 139441) 1 Humboldt Academy Of Higher Lrning (499410.4 138707.3)
# 8 (496820.2 139441) 1 Lafayette Preparatory Academy (498812.6 140006)
# 9 (496820.2 139441) 1 Lift For Life Academy (500025.8 139526.4)
# 10 (496820.2 139441) 1 Lift For Life Academy High School (500025.8 139526.4)
# # ... with 3,428 more rows
This means we can work out the distance from each store to each school. We then keep only combinations within 2000 m of each other (these are formed from stores and schools at opposite sides of the original 2 km buffer, which is why their distance exceeds 2 km).
# calculate distance from each store to each school
all_combos %>%
mutate(
distance = as.numeric(st_distance(geometry_stores, geometry_school, by_element = TRUE))
) %>%
filter(
distance <= 2000
) %>%
{. ->> all_combos_2}
all_combos_2
# Simple feature collection with 2231 features and 3 fields
# Active geometry column: geometry_stores
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 496820.2 ymin: 138115.8 xmax: 500484.2 ymax: 141987.8
# Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
# # A tibble: 2,231 x 5
# geometry_stores store_id School geometry_school distance
# * <POINT [m]> <int> <chr> <POINT [m]> <dbl>
# 1 (496820.2 139441) 1 AcademyOf Envt Sci/math Middle School (498610.1 140067.7) 1896.
# 2 (496820.2 139441) 1 Collegiate School Of Med/bio (496797.7 140597.6) 1157.
# 3 (496820.2 139441) 1 Eagle Fox Park (498015.9 139324.1) 1201.
# 4 (496820.2 139441) 1 Hodgen Elementary School (497853.4 140290.1) 1337.
# 5 (496820.2 139441) 1 Mckinley Class. Leadership Ac. (498355.8 139560.4) 1540.
# 6 (496820.2 139441) 1 Nahed Chapman New American Academy (496615.8 140605.6) 1182.
# 7 (496820.2 139441) 1 Shenandoah Elementary School (496821 139360.4) 80.6
# 8 (496820.2 139441) 1 Sigel Elementary Comm. Ed. Center (498603.2 139613.7) 1791.
# 9 (496820.2 139441) 1 St. Louis Christian Academy (497245.5 140196.9) 867.
# 10 (496848.1 140725.7) 2 AcademyOf Envt Sci/math Middle School (498610.1 140067.7) 1881.
# # ... with 2,221 more rows
Now if my understanding is correct, each store counts only towards the school it is closest to. So, we keep only the school each store is closest to using filter()
:
# first, keep only the closest school to each store
all_combos_2 %>%
arrange(store_id, distance) %>%
group_by(store_id) %>%
filter(
distance == min(distance)
) %>%
{. ->> all_combos_3}
# so now we have the closest school to each store
all_combos_3
# Simple feature collection with 223 features and 3 fields
# Active geometry column: geometry_stores
# Geometry type: POINT
# Dimension: XY
# Bounding box: xmin: 496820.2 ymin: 138115.8 xmax: 500484.2 ymax: 141987.8
# Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
# # A tibble: 223 x 5
# # Groups: store_id [191]
# geometry_stores store_id School geometry_school distance
# * <POINT [m]> <int> <chr> <POINT [m]> <dbl>
# 1 (496820.2 139441) 1 Shenandoah Elementary School (496821 139360.4) 80.6
# 2 (496848.1 140725.7) 2 Collegiate School Of Med/bio (496797.7 140597.6) 138.
# 3 (496987.8 138959.5) 3 Shenandoah Elementary School (496821 139360.4) 434.
# 4 (497052.2 139815.4) 4 St. Louis Christian Academy (497245.5 140196.9) 428.
# 5 (497030 140286.7) 5 St. Louis Christian Academy (497245.5 140196.9) 233.
# 6 (497122.5 138900.1) 6 Shenandoah Elementary School (496821 139360.4) 550.
# 7 (497033.2 140646.1) 7 Collegiate School Of Med/bio (496797.7 140597.6) 240.
# 8 (497099.8 140279.6) 8 St. Louis Christian Academy (497245.5 140196.9) 168.
# 9 (497199.7 138687.5) 9 Shenandoah Elementary School (496821 139360.4) 772.
# 10 (497154.4 139805.9) 10 St. Louis Christian Academy (497245.5 140196.9) 402.
# # ... with 213 more rows
Notice that we have 223 rows now. This means there are 32 duplicates (223 - 191); where there are two (or more) closest schools, and they are the same distance away from the store (in this example max duplicates = 2). However you choose to handle these is up to you. In this example I will leave them in the data, but if you only want a single school, you may choose the first alphabetically or a random choice etc.
So now, we can calculate how many of the stores are within 1000 m of the (their closest) school:
# now, how many closest stores are within 1000 m of each school
all_combos_3 %>%
filter(
distance <= 1000
) %>%
group_by(School) %>%
summarise(
Stores1000m = n()
) %>%
st_drop_geometry %>%
{. ->> combo_sum_1000}
combo_sum_1000
# # A tibble: 16 x 2
# School Stores1000m
# * <chr> <int>
# 1 AcademyOf Envt Sci/math Middle School 2
# 2 Collegiate School Of Med/bio 4
# 3 Dewey Sch.-internat'l. Studies 6
# 4 Eagle Fox Park 37
# 5 Hodgen Elementary School 17
# 6 Humboldt Academy Of Higher Lrning 10
# 7 Lafayette Preparatory Academy 1
# 8 Lift For Life Academy 8
# 9 Lift For Life Academy High School 8
# 10 Mckinley Class. Leadership Ac. 7
# 11 Peabody Elementary School 48
# 12 Shenandoah Elementary School 6
# 13 Sigel Elementary Comm. Ed. Center 7
# 14 St. Louis Christian Academy 7
# 15 St. Louis College Prep High School 14
# 16 St. Louis College Prep Middle School 14
And the same approach for stores within 2000 m:
# 2000 m
all_combos_3 %>%
filter(
distance <= 2000
) %>%
group_by(School) %>%
summarise(
Stores2000m = n()
) %>%
st_drop_geometry %>%
{. ->> combo_sum_2000}
combo_sum_2000
# # A tibble: 16 x 2
# School Stores2000m
# * <chr> <int>
# 1 AcademyOf Envt Sci/math Middle School 2
# 2 Collegiate School Of Med/bio 4
# 3 Dewey Sch.-internat'l. Studies 6
# 4 Eagle Fox Park 37
# 5 Hodgen Elementary School 18
# 6 Humboldt Academy Of Higher Lrning 10
# 7 Lafayette Preparatory Academy 1
# 8 Lift For Life Academy 8
# 9 Lift For Life Academy High School 8
# 10 Mckinley Class. Leadership Ac. 7
# 11 Peabody Elementary School 53
# 12 Shenandoah Elementary School 7
# 13 Sigel Elementary Comm. Ed. Center 7
# 14 St. Louis Christian Academy 7
# 15 St. Louis College Prep High School 24
# 16 St. Louis College Prep Middle School 24
And of course we can join these two datasets to match your desired output.
combo_sum_1000 %>%
full_join(combo_sum_2000) %>%
{. ->> combo_sum_joined}
combo_sum_joined
# # A tibble: 16 x 3
# School Stores1000m Stores2000m
# <chr> <int> <int>
# 1 AcademyOf Envt Sci/math Middle School 2 2
# 2 Collegiate School Of Med/bio 4 4
# 3 Dewey Sch.-internat'l. Studies 6 6
# 4 Eagle Fox Park 37 37
# 5 Hodgen Elementary School 17 18
# 6 Humboldt Academy Of Higher Lrning 10 10
# 7 Lafayette Preparatory Academy 1 1
# 8 Lift For Life Academy 8 8
# 9 Lift For Life Academy High School 8 8
# 10 Mckinley Class. Leadership Ac. 7 7
# 11 Peabody Elementary School 48 53
# 12 Shenandoah Elementary School 6 7
# 13 Sigel Elementary Comm. Ed. Center 7 7
# 14 St. Louis Christian Academy 7 7
# 15 St. Louis College Prep High School 14 24
# 16 St. Louis College Prep Middle School 14 24
I hope my interpretation of the problem is correct, I admit it is a little confusing as we switch between grouping by stores then schools etc. But I think this works.