0

My question is similar to this post where the distance between each point was calculated.

In my case, I am looking to find the distance of each point to the point with the highest value. I would also like to plot this relationship with lm(), but I am struggling to achieve both tasks with spatial data objects.

My data does not need CRS, it is based on the Euclidean distance (because these points are in a room).

A mock example of the data below, where column variable is of interest.

> dput(dat)
structure(list(date.hour = structure(c(1551057840, 1551057840, 
1551057840, 1551057840, 1551057840, 1551057840, 1551057840), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), id = c(2, 5, 7, 8, 9, 10, 11), variable = c(456, 
27, 130, 116, 92, 141, 145), xy_coord = c("6.2 14.8", "8.2 8.9", 
"4.2 8.9", "2.2 8.9", "8.2 3.5", "6.2 3.5", "4.2 3.5")), row.names = c(NA, 
-7L), groups = structure(list(id = c(2, 5, 7, 8, 9, 10, 11), 
    date.hour = structure(c(1551057840, 1551057840, 1551057840, 
    1551057840, 1551057840, 1551057840, 1551057840), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), .rows = structure(list(1L, 2L, 3L, 4L, 5L, 6L, 
        7L), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
    "list"))), row.names = c(NA, -7L), class = c("tbl_df", "tbl", 
"data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
"tbl", "data.frame"))

> dat
# A tibble: 7 x 4
# Groups:   id, date.hour [7]
  date.hour              id variable xy_coord
  <dttm>              <dbl>    <dbl> <chr>   
1 2019-02-25 01:24:00     2      456 6.2 14.8
2 2019-02-25 01:24:00     5       27 8.2 8.9 
3 2019-02-25 01:24:00     7      130 4.2 8.9 
4 2019-02-25 01:24:00     8      116 2.2 8.9 
5 2019-02-25 01:24:00     9       92 8.2 3.5 
6 2019-02-25 01:24:00    10      141 6.2 3.5 
7 2019-02-25 01:24:00    11      145 4.2 3.5 
> 

Turning the data frame into a SpatialPointsDataFrame with the sp() package:

#Split x and y to separate columns 
dat$x <- sapply(strsplit(as.character(dat$xy_coord), " "), "[", 1); dat$x <- as.numeric(dat$x)
dat$y <- sapply(strsplit(as.character(dat$xy_coord), " "), "[", 2); dat$y <- as.numeric(dat$y)

#SpatialPointsDataFrame
coordinates(dat) <- ~x+y

This is the point where I don't know what steps to take, but I want to know the distance of all the points to the highest value:

which.max(dat@data$variable)

And then plot this relationship with base plot().

If my question is unclear please let me know.

kpm
  • 53
  • 5
  • What do you mean by "plot this relationship with lm()"? Which relationship? – agila Nov 21 '21 at 10:40
  • Plot the effect of distance from the point with the highest value. – kpm Nov 21 '21 at 11:03
  • Not sure if my explanation is confusing, but I am using the highest value as proxy of the independent variable. – kpm Nov 21 '21 at 11:25
  • So you want to create a plot where the x values represent the distance from the point with the highest value of a covariate and what are the y values? the same covariate? – agila Nov 21 '21 at 13:24
  • Y values - labelled 'variable' - won't change. – kpm Nov 21 '21 at 21:34

1 Answers1

1

I'm still not sure I understand your question but I propose the following answer.

Load packages

library(sf)
#> Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1
library(tidyr)

Load data

dat = structure(
  list(
    date.hour = structure(
      c(
        1551057840, 1551057840, 1551057840, 1551057840, 1551057840, 
        1551057840, 1551057840
      ), 
      tzone = "UTC", 
      class = c(
        "POSIXct",
        "POSIXt"
      )
    ), 
    id = c(2, 5, 7, 8, 9, 10, 11), 
    variable = c(
      456, 27, 130, 116, 92, 141, 145
    ), 
    xy_coord = c(
      "6.2 14.8", "8.2 8.9", "4.2 8.9", "2.2 8.9", "8.2 3.5", "6.2 3.5", 
      "4.2 3.5"
    )
  ), 
  row.names = c(NA,-7L), 
  groups = structure(
    list(
      id = c(2, 5, 7, 8, 9, 10, 11),
      date.hour = structure(
        c(
          1551057840, 1551057840, 1551057840, 1551057840, 1551057840, 
          1551057840, 1551057840
        ), 
        tzone = "UTC", 
        class = c(
          "POSIXct",
          "POSIXt"
        )
      ), 
      .rows = structure(
        list(1L, 2L, 3L, 4L, 5L, 6L, 7L), 
        ptype = integer(0), 
        class = c(
          "vctrs_list_of", "vctrs_vctr", "list"
        )
      )
    ), 
    row.names = c(NA, -7L), 
    class = c("tbl_df", "tbl", "data.frame"), 
    .drop = TRUE
  ), 
  class = c("grouped_df", "tbl_df", "tbl", "data.frame")
)

Separate the xy_coord column, convert columns to numeric and create an sf object

dat_sf <- st_as_sf(
  separate(dat, xy_coord, c("x", "y"), sep = " ", convert = TRUE), 
  coords = c("x", "y")
)

Find the maximum of variable

which.max(dat_sf[["variable"]])
#> [1] 1

Compute all distances

dat_sf[["distances"]] <- st_distance(dat_sf, dat_sf[1, ])

Plot

plot(variable ~ distances, data = dat_sf)

Created on 2021-11-22 by the reprex package (v2.0.1)

You can also remove the first point (with distance = 0).

agila
  • 3,289
  • 2
  • 9
  • 20