How can I scrape table from PHP website using R?

Question

Looking to import data into R from a table on this page:

https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10

I've tried multiple methods using XML and httr with no luck. Have already looked at past posts including:

and

Scraping html tables into R data frames using the XML package

Wondering if maybe I'm not using the correct table ID from the source or if the table is not in the proper format given the tools I'm currently using?

Any and all help is much appreciated! Thanks in advance!

Abb, we can't comment on using the correct id or anything else about your process without seeing *your code*. — r2evans, Dec 15 '19 at 00:04
r2evans, I honestly don't even think the code matters much at this point because I can't find the correct id in the source. If you look at the source, `` highlights the table I'm after but again, no actual ID. If you drop down a little further in the source you'll start seeing the actual data I'm after... for example, `` refers to the 2.6 in the Tampa Bay Rays row under the column labeled "D1". Thanks again for your time and help!
2.6 — Abb, Dec 15 '19 at 01:11
I think I understand, but it seems that you are treating SO like a free-code service by asking us to provide scraping code with zero demonstrated effort on your part. Side note, are you certain that they permit harvesting data? The [terms of service](https://legacy.baseballprospectus.com/tos/) suggest otherwise. — r2evans, Dec 15 '19 at 01:33

score 1 · Accepted Answer · answered Dec 15 '19 at 01:33

This won't give you exactly what you want, but it might help get you started:

library(XML)
fname <- "standings20190910.html"
download.file("https://legacy.baseballprospectus.com/standings/index.php?odate=2019-09-10", destfile=fname)
doc0 <- htmlParse(file=fname, encoding="UTF-8")
doc1 <- xmlRoot(doc0)
doc2 <- getNodeSet(doc1, "//table[@id='content']")
standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, stringsAsFactors=FALSE)

You can look at the HTML source code of the table you're trying to scrape, and then try to figure out how to create a useful R object. Look carefully at the documentation for getNodeSet and readHTMLTable in the manual of the XML package (https://cran.r-project.org/web/packages/XML/XML.pdf).

Montgomery Clift, thank you so much! This is extremely helpful. I will look further into the documentation you suggested and go from there! Thanks again. — Abb, Dec 15 '19 at 01:41

How can I scrape table from PHP website using R?

1 Answers1

Linked