Extracting html tables from website

Question

I am trying to use XML, RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

Here is the code I am using

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables 
tmp[[13]]
tmp[[14]]

If you look at the tables it has not been able to parse the values from the webpage. I guess this due to some javascipt evaluation happening on the fly. Now if I use "save page as" option in google chrome(it does not work in mozilla) and save the page and then use the above code i am able to read in the values.

But is there a work around so that I can read the table of the fly ? It will be great if you can help.

Regards,

http://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package duplicate? — Brandon Bertelsen, May 06 '11 at 17:19
Hi Brandon, I guess it is not, if you run the code I wrote you will see I am getting the required table but not the values associated with the fields, due to what I guess is some javascipt issue — sayan dasgupta, May 06 '11 at 17:35
Yes, I've been playing with it, I couldn't find anything that downloads the page in the way that's necessary. The only recommendation that I can make is to setup a chron job to download the page with something like wget and then have R target the downloaded local file. — Brandon Bertelsen, May 19 '11 at 05:00
Although, that might not work either and you may have to implement some type of web scraping software prior to moving it into R. — Brandon Bertelsen, May 19 '11 at 05:10

score 1 · Accepted Answer · answered May 23 '11 at 17:19

Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.

Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.

You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.

Extracting html tables from website

1 Answers1