0

http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V

How do I port this as an XML document? I'm trying to parse this in R.

Kashif
  • 3,063
  • 6
  • 29
  • 45
  • 1
    It *is* an XML document, it is correctly structured, and `read_xml` reads it without problem. What have you tried that suggests it does not work? – r2evans Apr 20 '17 at 02:22

3 Answers3

1

You can use xml2 to read and parse:

library(xml2)
library(tidyverse)

xml <- read_xml('https://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V')

bart <- xml %>% xml_find_all('//station') %>%    # select all station nodes
    map_df(as_list) %>%    # coerce each node to list, collect to data.frame
    unnest()    # unnest list columns of data.frame

bart
#> # A tibble: 46 × 9
#>                            name  abbr gtfs_latitude gtfs_longitude
#>                           <chr> <chr>         <chr>          <chr>
#> 1  12th St. Oakland City Center  12TH     37.803768    -122.271450
#> 2              16th St. Mission  16TH     37.765062    -122.419694
#> 3              19th St. Oakland  19TH     37.808350    -122.268602
#> 4              24th St. Mission  24TH     37.752470    -122.418143
#> 5                         Ashby  ASHB     37.852803    -122.270062
#> 6                   Balboa Park  BALB     37.721585    -122.447506
#> 7                      Bay Fair  BAYF     37.696924    -122.126514
#> 8                 Castro Valley  CAST     37.690746    -122.075602
#> 9         Civic Center/UN Plaza  CIVC     37.779732    -122.414123
#> 10                     Coliseum  COLS     37.753661    -122.196869
#> # ... with 36 more rows, and 5 more variables: address <chr>, city <chr>,
#> #   county <chr>, state <chr>, zipcode <chr>
alistaire
  • 42,459
  • 4
  • 77
  • 117
0

Using library rvest. The base idea is to find nodes (xml_nodes) of interest with XPath selectors, then grab the values with xml_text

library(rvest)

doc <- read_xml("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")
names <- doc %>% 
  xml_nodes(xpath = "/root/stations/station/name") %>%
  xml_text()

names[1:5]

# [1] "12th St. Oakland City Center" "16th St. Mission"             "19th St. Oakland"             "24th St. Mission"            
# [5] "Ashby"                       
Andrew Lavers
  • 4,328
  • 1
  • 12
  • 19
0

I had some problems using the URL within read_html directly. So I used readLines first. After that, its finding all the nodesets with <station>. Transform it into a list and feed it into data.table::rbindlist. Idea of using rbindlist came from here

library(xml2)
library(data.table)
nodesets <- read_html(readLines("http://api.bart.gov/api/stn.aspx?cmd=stns&key=MW9S-E7SL-26DU-VV8V")) %>% 
    xml_find_all(".//station")
data.table::rbindlist(as_list(nodesets))
Community
  • 1
  • 1
chinsoon12
  • 25,005
  • 4
  • 25
  • 35