0

I would like to extract the latitude and longitude information of five stadiums in Dallas. Each of these five stadiums is listed in a column named "Venue", and is presented in a hyperlink form from a Wikipedia table on the following website: https://en.wikipedia.org/wiki/Sports_in_Dallas.

My goal is to write some code which allows me to go inside each hyperkink and open another hyperlink via the information in "Coordinates" to get the latitude and longitude in decimal form.

I need a new column called "latitude" and another one called "longitude" in my "data" table.

I am trying to loop it but do not have a clue on how to get it.

Please, I need your help.

library(rvest)
library(dplyr)

url = "https://en.wikipedia.org/wiki/Sports_in_Dallas"

data <- read_html(url) %>%
  html_element(".wikitable")%>%
  html_table()%>%
  select(Team, Sport, Venue)

for(i in 1:nrow(data)){
  
  data$latitude <- 

  data$longitude <- 
}

bretauv
  • 7,756
  • 2
  • 20
  • 57
econ221
  • 13
  • 2
  • Rather than scraping the lat-long yourself, it'd probably be easier to just collect the addresses of the stadium and use [`tidygeocoder`](https://jessecambon.github.io/tidygeocoder/) (or another package) to find the coordinates. – bretauv Mar 09 '23 at 08:24
  • To find the solution yourself, try scraping the coordinates of a single stadium first, then package your code in a function and apply it in a loop/`map()`/`apply()` call. – dufei Mar 09 '23 at 09:48
  • Do you mean the venues with a name that includes the wording "Stadium" in the first table? – Mikael Poul Johannesson Mar 09 '23 at 10:46
  • 2
    If you're after the coordinates and aren't fixed to scraping and using R then you'd really be better off querying wikidata instead see e.g. https://stackoverflow.com/questions/71380188/retrieve-latitude-and-longitude-of-a-sample-of-coordinates-from-wikidata-using-s – smartse Mar 09 '23 at 17:48
  • @smartse, there are number wikidata packages for R too. – margusl Mar 09 '23 at 18:30

1 Answers1

0

I have been able to extract the latitude and longitude of the stadiums with the following code :

library(RSelenium)
library(stringr)

shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate("https://en.wikipedia.org/wiki/Sports_in_Dallas")

list_Text <- list()
list_Url <- list()
counter <- 0

for(i in 1 : 10)
{
  for(j in 1 : 10)
  {
    print(paste0("i ", i, " j ", j))
    xpath <- paste0('//*[@id="mw-content-text"]/div[1]/table[1]/tbody/tr[', i, ']/td[3]/a[', j, ']')
    web_Obj <- tryCatch(remDr$findElement("xpath", xpath), error = function(e) NA)
    
    if(is.na(web_Obj) == FALSE)
    {
      counter <- counter + 1
      list_Text[[counter]] <- web_Obj$getElementText()[[1]]
      list_Url[[counter]] <- web_Obj$getElementAttribute("href")[[1]]
    }
  }
}

vec_Text <- unlist(list_Text)
vec_Url <- unlist(list_Url)
bool_Stadium <- stringr::str_detect(vec_Text, "Stadium")
vec_Text_Stadium <- vec_Text[bool_Stadium]
vec_Text_Url <- vec_Url[bool_Stadium]
nb_Url <- length(vec_Text_Url)

list_Latitude <- list()
list_Longitude <- list()

for(i in 1 : nb_Url)
{
  print(i)
  remDr$navigate(vec_Text_Url[i])
  web_Obj_Latitude <- remDr$findElement("class name", "latitude")
  web_Obj_Longitude <- remDr$findElement("class name", "longitude")
  list_Latitude[[i]] <- web_Obj_Latitude$getElementText()
  list_Longitude[[i]] <-  web_Obj_Longitude$getElementText()
}

list_Latitude
[[1]]
[[1]][[1]]
[1] "32°45′23″N"


[[2]]
[[2]][[1]]
[1] "32°50′23″N"


[[3]]
[[3]][[1]]
[1] "32°44′52″N"


[[4]]
[[4]][[1]]
[1] "32°55′46″N"


[[5]]
[[5]][[1]]
[1] "33°9′16″N"


> list_Longitude
[[1]]
[[1]][[1]]
[1] "97°5′5″W"


[[2]]
[[2]][[1]]
[1] "96°54′39″W"


[[3]]
[[3]][[1]]
[1] "97°5′34″W"


[[4]]
[[4]][[1]]
[1] "97°6′43″W"


[[5]]
[[5]][[1]]
[1] "96°50′7″W"

Emmanuel Hamel
  • 1,769
  • 7
  • 19