-1

I have two projects I'm working on where I need to do something like this. I have datasets of stores with lat and long coordinates and need to know, for each one, if there are any other stores within a 2 mile radius. For one of these projects, it would be especially useful if I could see the radius on a map. I have all the points mapped using mapview and would love if I could see a circle over each one so I can see if there are any gaps (new stores cannot be built within 2 miles of an existing store). For the other project, what I would like to get is a column showing the number of other stores in that radius. This one seems easier, though I haven't figured it out yet. I am new to R btw.

I've tried geodist and followed a tutorial to get a distance matrix, but I can't read it very well and I don't think it's what I need. I've also tried points_in_circle which did tell me which stores are within the radius, but I need it to do this for every pair of coordinates (I have 80 rows) and ideally give a more useable output.

zephryl
  • 14,633
  • 3
  • 11
  • 30
ksjm
  • 11
  • 1
    Have you checked out the sf package? – Just James Feb 08 '23 at 00:47
  • 1
    Welcome to StackOverflow. Please take the tour: https://stackoverflow.com/tour (note that you should ask only one question per post) and also edit your question to address the points made in this post: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – John Polo Feb 08 '23 at 01:04

1 Answers1

1

For the second question (number of other stores within a certain radius), you can use the geodist function as you mentioned and then some data management to get your results.

The following commands will work on a small dataset of five stores. It contains the store ID with the lat and long coordinates.

n.stores <- 5
set.seed(1234)
stores <- cbind(sort(sample(1:10, n.stores)), 
                runif (n.stores, -0.1, 0.1), 
                runif (n.stores, -0.1, 0.1)); stores

colnames(stores) <- c("store_id", "lat", "long")
stores
#      store_id          lat        long
# [1,]        1  0.033216752  0.08468670
# [2,]        4  0.002850228 -0.04153683
# [3,]        5  0.038718258  0.06745913
# [4,]        6  0.008994967 -0.04275534
# [5,]       10 -0.043453283 -0.04663584

Now calculate the distances between each store and the others, including itself.

d <- geodist (stores[,2:3]) # A 5-by-5 matrix
d
#           [,1]       [,2]      [,3]       [,4]      [,5]
# [1,]     0.000 14427.8245  2009.804 14416.5478 16899.483
# [2,] 14427.825     0.0000 12752.057   696.1801  5176.953
# [3,]  2009.804 12752.0567     0.000 12686.0604 15625.871
# [4,] 14416.548   696.1801 12686.060     0.0000  5844.660
# [5,] 16899.483  5176.9527 15625.871  5844.6603     0.000

The distances are in metres. Convert this matrix to a data frame to facilitate the next step.

df <- as.data.frame(d)
colnames(df) <- stores[,'store_id']
df$store_A <- as.numeric(colnames(df))
#           1          4         5          6        10 store_A
# 1     0.000 14427.8245  2009.804 14416.5478 16899.483       1
# 2 14427.825     0.0000 12752.057   696.1801  5176.953       4
# 3  2009.804 12752.0567     0.000 12686.0604 15625.871       5
# 4 14416.548   696.1801 12686.060     0.0000  5844.660       6
# 5 16899.483  5176.9527 15625.871  5844.6603     0.000      10

Then you can reshape to long-form and list or count the number of stores within a certain distance of each other (say 10 km).

library(tidyr)
library(dplyr)

df.long <- pivot_longer(df, -store_A, 
                        names_to="store_B", 
                        values_to="distance", 
                        names_transform=as.numeric) %>%
  filter(store_A != store_B) # omit self-references

df.long
# A tibble: 10 × 3
#   store_A store_B distance
#     <dbl>   <dbl>    <dbl>
# 1       1       4   14428.
# 2       1       5    2010.
# 3       1       6   14417.
# 4       1      10   16899.
# 5       4       1   14428.
# 6       4       5   12752.
# 7       4       6     696.
# 8       4      10    5177.
# 9       5       1    2010.
#10       5       4   12752.
#11       5       6   12686.
#12       5      10   15626.
#13       6       1   14417.
#14       6       4     696.
#15       6       5   12686.
#16       6      10    5845.
#17      10       1   16899.
#18      10       4    5177.
#19      10       5   15626.
#20      10       6    5845.

Now the group counts.

group_by(df.long, store_A) %>%
  filter(distance<10000) %>%
  print() %>%
  summarise(n=n())

# A tibble: 8 × 3
# Groups:   store_A [4]
#   store_A store_B distance
#     <dbl>   <dbl>    <dbl>
# 1       1       5    2010.
# 2       4       6     696.
# 3       4      10    5177.
# 4       5       1    2010.
# 5       6       4     696.
# 6       6      10    5845.
# 7      10       4    5177.
# 8      10       6    5845.

# A tibble: 5 × 2
#   store_A     n
#     <dbl> <int>
# 1       1     1
# 2       4     2
# 3       5     1
# 4       6     2
# 5      10     2

So, store "1" has 1 store within 10 km, store "4" has 2, store "5" has 1, and stores "6" and "10" have two each.

Edward
  • 10,360
  • 2
  • 11
  • 26
  • Thank you so much! I'm sorry to ask more of you but I'm having trouble figuring out how to use my actual data in place of your generated data. Here is the result of "dput(stores[1:5, ])": structure(c(1, 2, 3, 4, 5, 37.8235013224185, 96.0425417404622, -4.59339213557541, 54.7047243919224, 14.8625889327377, 59.3847681768239, 6.38101743534207, 19.3247522227466, -47.2227052785456, -44.0914582461119 ), dim = c(5L, 3L)) – ksjm Feb 08 '23 at 21:06
  • Particularly what I'm stuck on right now is around the colnames(df) step. When I do this, it puts all the names in the first column like: c("1000 Sycamore School Rd", "1024 Bridgewood Dr"," and so on. When I do the next step to add store_A, the column is full of "NA". – ksjm Feb 08 '23 at 23:46
  • 1
    Well yes. Your data should be either a matrix with 3 columns (as in my mock example) or a data frame. I suggest you edit your post to show the output of `str(my_data)`. – Edward Feb 09 '23 at 01:49