1

I have points on a street, and Im trying to calculate the distance between all the points and the first point in the street

Here is my sf object:

library(rgeos)
library(rgdal)
library(sf)
 #I actually have multiple points on a segment of 5 Km
df <- data.frame(lon = c(-121.95, -121.96, -121.97, -121.98), lat = c(37.35,37.36,37.37,37.38)) 
coordinates(df) <- c("lon", "lat")
proj4string(df) <- CRS("+init=epsg:4326") 
df_sf <- st_as_sf(df) %>% st_transform(3488)

I tried st_distance but the distance is not correct. What I sould get is to have each point distance to the begining of the street until the end. So basically it has to go from 0 to the 5000m.

roger
  • 89
  • 10

2 Answers2

5

If you add the by_element argument to st_distance(), I think you'll get the result you're looking for. If you leave by_element as FALSE (the default), you'll get a matrix instead.

st_distance(x = df_sf, 
            y = df_sf[1,], 
            by_element = TRUE)

Units: [m]
[1]    0.000 1420.606 2841.141 4261.604

Note that the distances differ slightly from the other answer b/c df_sf is projected. If we use the unprojected df object, the distances match:

st_distance(x = df, 
            y = df[1,], 
            by_element = T)

Units: [m]
[1]    0.000 1420.098 2840.125 4260.079

Edit in response to comment re: order

The distances will be in the order of your dataframe. They happen to be in ascending order in your example.

You could add the distances as a new column and then sort on that column using dplyr. Note that the by_element argument is necessary here, as the new column won't accept a matrix as a value.

library(dplyr)
df_sf %>% 
  mutate(dist_to_first = st_distance(x = df_sf, 
                                     y = df_sf[1,], 
                                     by_element = TRUE)) %>% 
  arrange(dist_to_first)

Simple feature collection with 4 features and 1 field
geometry type:  POINT
dimension:      XY
bbox:           xmin: -175068.7 ymin: -72303.08 xmax: -172485.1 ymax: -68913.94
epsg (SRID):    3488
proj4string:    +proj=aea +lat_1=34 +lat_2=40.5 +lat_0=0 +lon_0=-120 +x_0=0 +y_0=-4000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
  dist_to_first                    geometry
1     0.000 [m] POINT (-172485.1 -72303.08)
2  1420.606 [m] POINT (-173346.5 -71173.46)
3  2841.141 [m] POINT (-174207.7 -70043.74)
4  4261.604 [m] POINT (-175068.7 -68913.94)
Eugene Chong
  • 1,591
  • 5
  • 16
  • Thanks @Eugene, is the ordering of the points automatic? if no how Can I order them to be able to get successive distances along the street? – roger Dec 09 '19 at 14:25
  • Thanks for the edit, The only thing is that the items in real data are not ordered. So in this case I have to order them first then calculate using st_distance – roger Dec 09 '19 at 14:38
  • To clarify, as long as the first row in your sf dataframe is the correct first point, you can calculate the distances using `st_distance()` and _then_ sort. No need to order them first. – Eugene Chong Dec 09 '19 at 14:41
  • thanks @Eugene Chong, Is there a way to sort them before doing this? or to get the first point in the segment? as my real sf data are note sorted – roger Dec 09 '19 at 16:43
  • @roger, not sure I follow what you're looking for. You would like to sort the points by distance from an origin point before finding the distances themselves? Regarding finding/setting the origin point itself, I'm not sure how you should do that as that question is particular to your project. However, if you know, for instance, that the 4th row in the dataframe rather than the 1st row is your origin, you can just set `y = df_sf[1,]` to `y = df_sf[4,]` instead. – Eugene Chong Dec 09 '19 at 18:07
3

The geosphere package has the distGeo function for this type of calculation:

library(geosphere)

df <- data.frame(lon = c(-121.95, -121.96, -121.97, -121.98), lat = c(37.35,37.36,37.37,37.38)) 

#calculate dist from first row
# to all rows in df
distGeo(df[1,], df)
[1]    0.000 1420.098 2840.125 4260.079

For distGeo, the first column is longitude and the second column is latitude, with the output in meters.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
  • Thank you @Dave2e the problem is that I have a huge quantity of data, distGeo or disthaversin can calculate distance between pair of points. SF library functions are much faster as the items are indexed that can handel passing thousands of points without taking too muchh time – roger Dec 09 '19 at 14:20
  • @roger, you may want to check this question for some benchmarking on the faster method to use: https://stackoverflow.com/questions/57583160/how-to-use-doparallel-for-calculating-distance-between-zipcodes-in-r/57583881#57583881 – Dave2e Dec 09 '19 at 14:33
  • Thanks Dave I will check out that – roger Dec 09 '19 at 14:37