4

I am trying to use XML, RCurl package to read some html tables of the following URL http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#

Here is the code I am using

library(RCurl)
library(XML)
options(RCurlOptions = list(useragent = "R"))
url <- "http://www.nse-india.com/marketinfo/equities/cmquote.jsp?key=SBINEQN&symbol=SBIN&flag=0&series=EQ#"
wp <- getURLContent(url)
doc <- htmlParse(wp, asText = TRUE) 
docName(doc) <- url
tmp <- readHTMLTable(doc)
## Required tables 
tmp[[13]]
tmp[[14]]

If you look at the tables it has not been able to parse the values from the webpage. I guess this due to some javascipt evaluation happening on the fly. Now if I use "save page as" option in google chrome(it does not work in mozilla) and save the page and then use the above code i am able to read in the values.

But is there a work around so that I can read the table of the fly ? It will be great if you can help.

Regards,

sayan dasgupta
  • 1,084
  • 6
  • 15
  • http://stackoverflow.com/questions/1395528/scraping-html-tables-into-r-data-frames-using-the-xml-package duplicate? – Brandon Bertelsen May 06 '11 at 17:19
  • Hi Brandon, I guess it is not, if you run the code I wrote you will see I am getting the required table but not the values associated with the fields, due to what I guess is some javascipt issue – sayan dasgupta May 06 '11 at 17:35
  • Yes, I've been playing with it, I couldn't find anything that downloads the page in the way that's necessary. The only recommendation that I can make is to setup a chron job to download the page with something like wget and then have R target the downloaded local file. – Brandon Bertelsen May 19 '11 at 05:00
  • Although, that might not work either and you may have to implement some type of web scraping software prior to moving it into R. – Brandon Bertelsen May 19 '11 at 05:10

1 Answers1

1

Looks like they're building the page using javascript by accessing http://www.nse-india.com/marketinfo/equities/ajaxGetQuote.jsp?symbol=SBIN&series=EQ and parsing out some string. Maybe you could grab that data and parse it out instead of scraping the page itself.

Looks like you'll have to build a request with the proper referrer headers using cURL, though. As you can see, you can't just hit that ajaxGetQuote page with a bare request.

You can probably read the appropriate headers to put in by using the Web Inspector in Chrome or Safari, or by using Firebug in Firefox.

Tim Snowhite
  • 3,736
  • 2
  • 23
  • 27