-1

I have been trying to scrape live quotes form this website but am running into an error. The code I used is given below

library(XML)
webpage='http://quotes.freerealtime.com/dl/frt/M?SA=Percent+Gainers&IM=stats&stat=3'

# parse url
url_parsed <- htmlParse(getURL(webpage), asText = TRUE)

# select table nodes of interest
tableNodes <- getNodeSet(url_parsed, '/html/body/table[2]/tbody/tr/td[4]/table[2]/tbody/tr[2]/td/table')

But the tableNodes turns out to be NULL. Can anyone help me figure this out?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213

2 Answers2

1

I think the following link explains your main problem

Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?

So the following does return values

tableNodes <- getNodeSet(url_parsed, '/html/body/table[2]/tr/td[4]/table[2]/tr[2]/td')

but since the actual table is renderd by js, you will not be able to access it.

I would suggest looking at the following :

https://www.datacamp.com/community/tutorials/scraping-javascript-generated-data-with-r

Carlos Santillan
  • 1,077
  • 7
  • 8
0

The webpage uses AJAX. You may open the webpage in Chrome, then press F12 to open Developer Tools, and go to Network tab. Refresh the webpage, and examine logged XHRs.

The table on the webpage is split in 4 parts, so you can find 4 logged requests containing the necessary data with URLs like http://app.quotemedia.com/quotetools/scalingMarketStats.go?webmasterId=100804&toolWidth=620&statExchange=NSD&stat=pg&statTop=15&targetsym=symbol&detailURL=http://quotes.freerealtime.com/dl/frt/M%3fIM=quotes%26type=Quote%26SA=quotes%26symbol=symbol&sid=0. There are two <table> tags within each HTML response. Extract the second (nested) table, it contains the necessary data:

data

Those URLs you can find within webpage HTML content from the first logged request by http://quotes.freerealtime.com/dl/frt/M?SA=Percent+Gainers&IM=stats&stat=3.

Try the following steps to scrape live quotes from the website:

  1. Make request to http://quotes.freerealtime.com/dl/frt/M?SA=Percent+Gainers&IM=stats&stat=3
  2. Extract all URLs containing app.quotemedia.com/quotetools/scalingMarketStats.go from response.
  3. Make request to each extracted URL.
  4. Extract nested table from each response.
omegastripes
  • 12,351
  • 4
  • 45
  • 96