8

I am using the following code:

url  = "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

library(XML)
tabs = readHTMLTable(url, stringsAsFactors = F)

I get the following error:

Error: failed to load external entity "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

When I use the url in the browser it works fine. So, what am I doing incorrect here?

Thanks

Zanam
  • 4,607
  • 13
  • 67
  • 143
  • Your code works fine for me. – Thomas Jun 11 '13 at 13:45
  • It works for me too. Based on http://stackoverflow.com/questions/14629026/r-readhtmltable-error-failed-to-load-external-entity, it sounds like this might be an issue with your internet connection. Are you able to load the page in a browser? – SchaunW Jun 11 '13 at 13:49
  • Yes I am able to load the page fine in a browser. So, my internet connection is fine I assume. – Zanam Jun 11 '13 at 13:52
  • Can you run `library(RCurl); tabs = getURL(url)` without triggering an error? – SchaunW Jun 11 '13 at 14:05
  • Proxy setting try methods here http://stackoverflow.com/questions/6467277/proxy-setting-for-r ,may help you – user2982707 Dec 23 '13 at 09:59

2 Answers2

16

It's difficult to know for sure since I can't replicate your error, but according the package's author (see http://comments.gmane.org/gmane.comp.lang.r.mac/2284), XML's methods for getting web content are pretty minimalistic. A workaround is to use RCurl to get the content and XML to parse it:

library(XML)
library(RCurl)

url <- "http://finance.yahoo.com/q/op?s=DIA&m=2013-07"

tabs <- getURL(url)
tabs <- readHTMLTable(tabs, stringsAsFactors = F)

Or, if RCurl still throws an error, try the httr package:

library(httr)

tabs <- GET(url)
tabs <- readHTMLTable(rawToChar(tabs$content), stringsAsFactors = F)
SchaunW
  • 3,561
  • 1
  • 21
  • 21
0

I just got the same error as above "failed to load external entity" when using url <- "http://www.cisco.com/c/en/us/products/a-to-z-series-index.html" doc <- htmlTreeParse(url, useInternal=TRUE)

I came across this and another post on the topic, which didn't solve my problem. This code worked before. I then realized that I was on corporate VPN. I got off the VPN and tried again and it worked. So, being on VPN might be another reason why you would get the above error. Getting off VPN solves it.

Raj Ayala
  • 23
  • 2