Null results from readHTMLTable in R

Question

I'm trying to scrape data off a website in R using the XML package, but I'm not getting any results. My code is below. The results are NULL. The first line turns up a null result (it's not finding any tables).

url = http://www.machinerytrader.com/list/list.aspx?pg=1&ETID=5&catid=1015&SO=26&mdlx=contains&bcatid=4&Pref=0&Thumbs=1&scf=false&units=imperial

Code:

tables <- readHTMLTable(url, stringsAsFactors=FALSE)
data<-do.call("rbind", tables[seq(from=8, to=56, by=2)])
data<-cbind(data, sapply(lapply(tables[seq(from=9, to=57, by=2)],  '[[', i=2), '[', 1))
rownames(data)<-NULL
names(data) <- c("year.man.model", "s.n", "price", "location", "auction")
head(data)

Any help would be greatly appreciated!

Don

Yeah that's definitely where the issue stems from, but I can't figure out why. I'll edit original question to make that clear. — Don S, Mar 06 '14 at 00:54
Seems like the table is generated by javascript, that makes it a bit more challenging, but have a search and you might get some useful code — Ben, Mar 06 '14 at 01:18

score 0 · Answer 1 · edited May 23 '17 at 11:57

It looks like it's a wretchedly built site issue. Doing the following "manually":

library(RCurl)
library(XML)

url <- "http://www.machinerytrader.com/list/list.aspx?pg=1&ETID=5&catid=1015&SO=26&mdlx=contains&bcatid=4&Pref=0&Thumbs=1&scf=false&units=imperial"
pg <- getURL(url)
conn <- textConnection(pg)
pg <- readLines(conn)
close(conn)

has at element [33] of pg (in this particular call):

pg[33]
[1] "<noscript>Please enable JavaScript to view the page content.</noscript>"

I usually do a quick debug in Google Spreadsheets via the IMPORTHTML function (I actually prefer letting Google handle the data import and transformation in general) and it couldn't even scrape the page.

I tried it with both command-line curl and wget and (unsurprisingly) got the same result.

You may need to go this route: Scraping websites with Javascript enabled? to get what you need. I might be missing something obvious, though.

score 0 · Answer 2 · edited May 23 '17 at 10:26

0

Got an answer on a different thread. Basically, you need to use the relenium package in R.

Solution: Scraping javascript website

edited May 23 '17 at 10:26

Community

1
1

answered Mar 06 '14 at 17:31

Don S

231
2
9

Null results from readHTMLTable in R

2 Answers2