2

I have a data table as follows, which I need to convert into an sf object:

library(data.table)
DT <- data.table(
    ID = c("A", "B", "C", "D", "E"),
    Value = 11:15
)

The geometry information for each ID is a character vector with differing number of coordinates making up the linestring. ID "B" has 4 coordinates; ID "C" has 5.

geo <- c("c(-112.116492521272, -112.116492811159, -112.116492812107, -112.116491781569, -112.116482854256, -112.116482819195, -112.116476331207, -112.116476325101, -112.11647589954, 33.3777109072744, 33.377733456163, 33.377733512504, 33.377817189599, 33.3785425053239, 33.3785454379367, 33.3790725760563, 33.3790731291841, 33.3791076444333)", 
         "c(-112.282916223332, -112.282955145531, -112.282977080374, -112.282986066594, 33.499285198973, 33.4994146786288, 33.4995335119373, 33.4998030580162)", 
         "c(-112.281058674957, -112.281058522318, -112.281057917087, -112.281057356648, -112.281055594103, -112.281047371356, -112.281048086137, -112.28104821173, 33.4937123457776, 33.4937301348982, 33.4938008007847, 33.4938659107566, 33.4940708243904, 33.4950232493953, 33.4951159682343, 33.4951322463168)", 
         "c(-112.282978024041, -112.282977000088, -112.282975472281, -112.282975387447, -112.282974470679, -112.282974464144, -112.282974284899, -112.28297410899, -112.282974107453, 33.5011764123633, 33.5013710145493, 33.5016617311961, 33.501678000948, 33.5018530730796, 33.5018546369058, 33.5018887965849, 33.5019223852857, 33.5019226044706)", 
         "c(-112.282986066594, -112.282985540911, -112.282984156895, -112.282983004093, -112.282982201845, 33.4998030580162, 33.4998965204233, 33.5001425170464, 33.5003478058912, 33.5004906801949)"
)

Adding the geometry information to DT:

DT$geometry <- geo

Now, I need to convert DT into an sf object with geometry specified as sfc_LINESTRING. I tried using st_cast to first convert the character-based geometry variable into linestring, but it yielded an error.

DT_sf <- st_cast(DT$geometry, "LINESTRING")
Error in UseMethod("st_cast") : 
  no applicable method for 'st_cast' applied to an object of class "character"

This conversion needs to be done for nearly 20,000 rows. So, I am looking for a computationally efficient way to achieve the required result.

user438383
  • 5,716
  • 8
  • 28
  • 43
  • Your geometry format - a text formatted somewhat like a R vector, with no clear separation of X and Y coordinates - is somewhat unusual. Is this the raw data? Also, in what coordinate reference system should the numbers be interpreted? I assume decimal degrees, but it is not quite obvious.... – Jindra Lacko May 09 '22 at 06:30
  • Actually, this reprex table is from a shapefile that was at one point loaded as a data frame and imported to a database as a dataset with all variables (including sfc_LINESTRING-coded geometry variable) in character format. So, this is the raw data when I read it from the database. The crs is 4326. – Pramesh Pudasaini May 09 '22 at 06:42
  • The format is indeed unusual, but, assuming this is given, I suggest you first find a way to parse the `geo` data into something that `sf` can work with - e.g., convert it to a list of matrices, where each matrix corresponds to one of the character vectors, and describe the points of the linestring. From there it should be easier to continue. – amitr May 09 '22 at 06:42

3 Answers3

3

You can use eval to "evaluate an expression given as a string"

(This can be dangerious if your string isn't sanitised (think of SQL injection))

So with your geo object you get

lst <- lapply(geo, function(x) { eval(parse(text = x)) })

str( lst )
List of 5
# $ : num [1:18] -112 -112 -112 -112 -112 ...
# $ : num [1:8] -112.3 -112.3 -112.3 -112.3 33.5 ...
# $ : num [1:16] -112 -112 -112 -112 -112 ...
# $ : num [1:18] -112 -112 -112 -112 -112 ...
# $ : num [1:10] -112 -112 -112 -112 -112 ...

Since we're evaluating each vector in geo one at a time (inside the lapply), we can also make it an sfg object at the same time

lst <- lapply(geo, function(x) {
  v <- eval(parse(text = x))
  m <- matrix(v, ncol = 2)
  sf::st_linestring(m)
})

Then all that's needed is to add the correct class attributes

DT$geo <- lst

DT$geo <- sf::st_as_sfc( DT$geo )
DT <- sf::st_as_sf( DT )
sf::st_crs( DT ) <- 4326

DT
# Simple feature collection with 5 features and 2 fields
# Geometry type: LINESTRING
# Dimension:     XY
# Bounding box:  xmin: -112.283 ymin: 33.37771 xmax: -112.1165 ymax: 33.50192
# Geodetic CRS:  WGS 84
# ID Value                            geo
# 1  A    11 LINESTRING (-112.1165 33.37...
# 2  B    12 LINESTRING (-112.2829 33.49...
# 3  C    13 LINESTRING (-112.2811 33.49...
# 4  D    14 LINESTRING (-112.283 33.501...
# 5  E    15 LINESTRING (-112.283 33.499...
SymbolixAU
  • 25,502
  • 4
  • 67
  • 139
1

You can remove the c() notation and then split the coordinates into a list. Then you can create a data frame and use e.g the upper half rows and latitudes. This can be converted in a matrix which is a format st_linestring understands:

library(sf)
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
library(tidyverse)
library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
#> The following object is masked from 'package:purrr':
#> 
#>     transpose

geo <- c(
  "c(-112.116492521272, -112.116492811159, -112.116492812107, -112.116491781569, -112.116482854256, -112.116482819195, -112.116476331207, -112.116476325101, -112.11647589954, 33.3777109072744, 33.377733456163, 33.377733512504, 33.377817189599, 33.3785425053239, 33.3785454379367, 33.3790725760563, 33.3790731291841, 33.3791076444333)",
  "c(-112.282916223332, -112.282955145531, -112.282977080374, -112.282986066594, 33.499285198973, 33.4994146786288, 33.4995335119373, 33.4998030580162)",
  "c(-112.281058674957, -112.281058522318, -112.281057917087, -112.281057356648, -112.281055594103, -112.281047371356, -112.281048086137, -112.28104821173, 33.4937123457776, 33.4937301348982, 33.4938008007847, 33.4938659107566, 33.4940708243904, 33.4950232493953, 33.4951159682343, 33.4951322463168)",
  "c(-112.282978024041, -112.282977000088, -112.282975472281, -112.282975387447, -112.282974470679, -112.282974464144, -112.282974284899, -112.28297410899, -112.282974107453, 33.5011764123633, 33.5013710145493, 33.5016617311961, 33.501678000948, 33.5018530730796, 33.5018546369058, 33.5018887965849, 33.5019223852857, 33.5019226044706)",
  "c(-112.282986066594, -112.282985540911, -112.282984156895, -112.282983004093, -112.282982201845, 33.4998030580162, 33.4998965204233, 33.5001425170464, 33.5003478058912, 33.5004906801949)"
)

parsed_geo <-
  geo %>%
  map(~ {
    .x %>%
      str_remove_all("c[(]|[)]$") %>%
      str_split(",") %>%
      first() %>%
      map_chr(str_trim) %>%
      as_tibble() %>%
      mutate(lon_lat = row_number() <= n() / 2) %>%
      pivot_wider(names_from = lon_lat, values_fn = list) %>%
      unnest() %>%
      type_convert() %>%
      as.matrix() %>%
      st_linestring()
  })
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(`TRUE`, `FALSE`)`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `TRUE` = col_double(),
#>   `FALSE` = col_double()
#> )
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(`TRUE`, `FALSE`)`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `TRUE` = col_double(),
#>   `FALSE` = col_double()
#> )
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(`TRUE`, `FALSE`)`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `TRUE` = col_double(),
#>   `FALSE` = col_double()
#> )
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(`TRUE`, `FALSE`)`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `TRUE` = col_double(),
#>   `FALSE` = col_double()
#> )
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(`TRUE`, `FALSE`)`
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `TRUE` = col_double(),
#>   `FALSE` = col_double()
#> )

DT <- data.table(
  ID = c("A", "B", "C", "D", "E"),
  Value = 11:15
)
DT$geometry <- parsed_geo
DT
#>    ID Value                           geometry
#> 1:  A    11 LINESTRING (-112.1165 33.37...,...
#> 2:  B    12 LINESTRING (-112.2829 33.49...,...
#> 3:  C    13 LINESTRING (-112.2811 33.49...,...
#> 4:  D    14 LINESTRING (-112.283 33.501...,...
#> 5:  E    15 LINESTRING (-112.283 33.499...,...

Created on 2022-05-09 by the reprex package (v2.0.0)

danlooo
  • 10,067
  • 2
  • 8
  • 22
1

This works:

library(sfheaders)

DT <- data.table(
      ID = c("A", "B", "C", "D", "E"),
      Value = 11:15)

geo <- c("c(-112.116492521272, -112.116492811159, -112.116492812107, -112.116491781569, -112.116482854256, -112.116482819195, -112.116476331207, -112.116476325101, -112.11647589954, 33.3777109072744, 33.377733456163, 33.377733512504, 33.377817189599, 33.3785425053239, 33.3785454379367, 33.3790725760563, 33.3790731291841, 33.3791076444333)", 
         "c(-112.282916223332, -112.282955145531, -112.282977080374, -112.282986066594, 33.499285198973, 33.4994146786288, 33.4995335119373, 33.4998030580162)", 
         "c(-112.281058674957, -112.281058522318, -112.281057917087, -112.281057356648, -112.281055594103, -112.281047371356, -112.281048086137, -112.28104821173, 33.4937123457776, 33.4937301348982, 33.4938008007847, 33.4938659107566, 33.4940708243904, 33.4950232493953, 33.4951159682343, 33.4951322463168)", 
         "c(-112.282978024041, -112.282977000088, -112.282975472281, -112.282975387447, -112.282974470679, -112.282974464144, -112.282974284899, -112.28297410899, -112.282974107453, 33.5011764123633, 33.5013710145493, 33.5016617311961, 33.501678000948, 33.5018530730796, 33.5018546369058, 33.5018887965849, 33.5019223852857, 33.5019226044706)", 
         "c(-112.282986066594, -112.282985540911, -112.282984156895, -112.282983004093, -112.282982201845, 33.4998030580162, 33.4998965204233, 33.5001425170464, 33.5003478058912, 33.5004906801949)")

geo <- geo %>%  
  sub('c\\(', '', x = .) %>% 
  sub('\\)', '', x = .) %>% 
  lapply(function(x) x %>% 
           str_split_fixed(",", str_count(string = ., ',')+1) %>% 
           matrix(ncol=2) %>% 
           data.frame)

geom <- list()
for (i in seq(length(geo))){
  geom <- c(geom, sfheaders::sf_linestring(matrix(as.numeric(as.matrix(geo[[i]])), 
                                                  ncol=2)))
}

sequ <- seq.default(from = 2, to = length(geom), by = 2)

DT$geometry <- geom %>% 
  do.call(rbind, .[sequ]) %>% 
  st_sfc()

   ID Value                           geometry
1:  A    11 LINESTRING (-112.1165 33.37...,...
2:  B    12 LINESTRING (-112.2829 33.49...,...
3:  C    13 LINESTRING (-112.2811 33.49...,...
4:  D    14 LINESTRING (-112.283 33.501...,...
5:  E    15 LINESTRING (-112.283 33.499...,...
D.J
  • 1,180
  • 1
  • 8
  • 17