0

I am web-scraping the Coordinates Data for some locations using wikipedia. I am following the steps outlined here: (note I changed the example on the hyperlink to match my work)

library(plyr)
library(dplyr)
library(xml2)
library(rvest)
library(magrittr)
library(geosphere)
location<-"Mendizorrotza"
#read HTML code from the website
  webpage<-read_html(paste0("https://en.wikipedia.org/wiki/",location))
  table <- webpage %>%
    html_nodes("table.vcard") %>%
    html_table(header=F)
  table <- table[[1]]

  #add the table to a dataframe
  dict <- as.data.frame(table)

Within the coordinates row, it gives me three options:

42°50′13.60″N 2°41′16.96″W ;

42.8371111°N 2.6880444°W ;

42.8371111; -2.6880444

all on one line. I would like to find the distance between a pair of coordinates. Therefore, which one should I use to do so and how should I extract it? Also I never worked with coordinates before, which equation should I use to find my desired value?

Jack Armstrong
  • 1,182
  • 4
  • 26
  • 59
  • Side note: there's an API function for obtaining the coordinates from a Wikipedia article: https://stackoverflow.com/questions/40098656/how-to-get-coordinates-from-a-wikipedia-page-through-api – Where's my towel Jul 09 '19 at 15:02
  • How would I implement that with `read.html()`? – Jack Armstrong Jul 09 '19 at 15:05
  • I suspect (!) it would be easier to get the data in JSON format, e.g. https://en.wikipedia.org/w/api.php?action=query&prop=coordinates&titles=Kinkaku-ji&format=json – Where's my towel Jul 09 '19 at 15:10
  • Okay, but what exactly am I reading or saving as a variable from the link? – Jack Armstrong Jul 09 '19 at 15:12
  • Looks quite messy these mixed coordinate formats. You may want to read this: https://stackoverflow.com/q/14404596/6574038 and have a look into `geosphere::distGeo`. – jay.sf Jul 09 '19 at 15:18

0 Answers0