I'm trying to crack an R workflow for parsing SVG paths, using this file on this webpage. I'm encountering artifacts in the positioning of resulting polygons:
Some of the countries do not align with their neighbours - e.g. US/Canada, US/Mexico, Russia/Asian neighbours. Since the effect hits the countries with more complex polygons it seems likely to be a problem to do with cumulative summing, but I'm unclear where the problem lies in my workflow, which is:
- parse raw SVG as XML, and extract all the SVG path strings
- parse individual path strings with
nodejs
's svg-path-parser module - process the resulting data.frames (which combine absolute and relative coordinates) into all absolute coordinates
I reproduce the full workflow here using R (for US/Canada), with an external call to nodejs:
require(dplyr)
require(purrr)
require(stringr)
require(tidyr)
require(ggplot2)
require(rvest)
require(xml2)
require(jsonlite)
# Get and parse the SVG
doc = read_xml('https://visionscarto.net/public/fonds-de-cartes-en/visionscarto-bertin1953.svg')
countries = doc %>% html_nodes('.country')
names(countries) = html_attr(countries, 'id')
cdi = str_which(names(countries), 'CIV') # unicode in Cote d'Ivoire breaks the code
countries = countries[-cdi]
# Extract SVG paths and parse with node's svg-path-parser module.
# If you don't have node you can use this instead (note this step might be the problem):
# d = read_csv('https://gist.githubusercontent.com/geotheory/b7353a7a8a480209b31418c806cb1c9e/raw/6d3ba2a62f6e8667eef15e29a5893d9d795e8bb1/bertin_svg.csv')
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG(d)));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame()
# some initial processing
d1 = d %>% filter(country %in% c('USA United States','CAN Canada')) %>%
mutate(x = replace_na(x, 0), y = replace_na(y, 0), # NAs need replacing
relative = replace_na(relative, FALSE),
grp = (command == 'closepath') %>% cumsum) # polygon grouping variable
# new object to loop through
d2 = d1 %>% mutate(x_adj = x, y_adj = y) %>% filter(command != 'closepath')
# loop through and change relative coords to absolute
for(i in 2:nrow(d2)){
if(d2$relative[i]){ # cumulative sum where coords are relative
d2$x_adj[i] = d2$x_adj[i-1] + d2$x_adj[i]
d2$y_adj[i] = d2$y_adj[i-1] + d2$y_adj[i]
} else{ # code M/L require no alteration
if(d2$code[i] == 'V') d2$x_adj[i] = d2$x_adj[i-1] # absolute vertical transform inherits previous x
if(d2$code[i] == 'H') d2$y_adj[i] = d2$y_adj[i-1] # absolute holrizontal transform etc
}
}
# plot result
d2 %>% ggplot(aes(x_adj, -y_adj, group = paste(country, grp))) +
geom_polygon(fill='white', col='black', size=.3) +
coord_equal() + guides(fill=F)
Any assistance appreciated. The SVG path syntax is specified at w3 and summarised more concisely here.
Edit (response to @ccprog)
Here is data returned from svg-path-parser
for the H
command sequence:
code command x y relative country
<chr> <chr> <dbl> <dbl> <lgl> <chr>
1 l lineto -0.91 -0.6 TRUE CAN Canada
2 l lineto -0.92 -0.59 TRUE CAN Canada
3 H horizontal lineto 189. NA NA CAN Canada
4 l lineto -1.03 0.02 TRUE CAN Canada
5 l lineto -0.74 -0.07 TRUE CAN Canada
Here is what d2
looks like for same sequence after the loop:
code command x y relative country grp x_adj y_adj
<chr> <chr> <dbl> <dbl> <lgl> <chr> <int> <dbl> <dbl>
1 l lineto -0.91 -0.6 TRUE CAN Canada 20 199. 143.
2 l lineto -0.92 -0.59 TRUE CAN Canada 20 198. 143.
3 H horizontal lineto 189. 0 FALSE CAN Canada 20 189. 143.
4 l lineto -1.03 0.02 TRUE CAN Canada 20 188. 143.
5 l lineto -0.74 -0.07 TRUE CAN Canada 20 187. 143.
Does this not look ok?. When I look at raw values for y_adj for H
and previous rows they are identical 142.56
.
Edit 2: working solution, thanks to @ccprog
d = imap_dfr(countries, ~{
message(.y)
svg_path = xml_find_all(.x, paste0("//*[@id='", .y, "']/d1:path")) %>% html_attr('d')
node_call = paste0("node -e \"var parseSVG = require('svg-path-parser'); var d='", svg_path,
"'; console.log(JSON.stringify(parseSVG.makeAbsolute(parseSVG(d))));\"")
system(node_call, intern = T) %>% fromJSON %>% mutate(country = .y)
}) %>% as_data_frame() %>%
mutate(grp = (command == 'moveto') %>% cumsum)
d %>% ggplot(aes(x, -y, group = grp, fill=country)) +
geom_polygon(col='black', size=.3, alpha=.5) +
coord_equal() + guides(fill=F)