1

I am trying to extract an XML file from Air Canada's website that contains weather data from their radar system. The URL that contains the XML file is here

I am stuck right at the start, where I thought it would be as simple as reading in to the URL using the xmlParse function from the XML package.

library(XML)

url = "https://www.aircanada.com/content/dam/aircanada/portal/data/weather/AirCanada.xml"
xmlParse(url)

However, I get the following error:

Error: XML content does not seem to be XML

It is clearly an XML file, so I am not sure why I am getting this error. Any help/direction would be much appreciated.

zx485
  • 28,498
  • 28
  • 50
  • 59
SteveM
  • 213
  • 3
  • 13
  • 3
    ... xmlParse doesn't retrieve information from a url. You're asking it to parse the string "https://www.aircanada.com/content/dam/aircanada/portal/data/weather/AirCanada.xml", not the page. You have to add an argument `isURL=TRUE` – Jean Feb 22 '17 at 03:51
  • 1
    _"you will not…access or use…the Website through any…automatic, electronic or technical device, including but not limited to automated scripts, robots, crawls, screen scrapers, web "bots", …, spiders, –, macro programs, or any other…program, software, system, algorithm, methodology or technology…that performs the same or a similar function, in order to, without limitation: "data mine"; "screen scrape"; data process; access, extract, copy, distribute, aggregate or acquire information;…input or store information;…or manipulate or monitor any portion or content of the Website;"_ – hrbrmstr Feb 22 '17 at 04:04

1 Answers1

0

Checking the XML file at this URL shows that it contains some invalid characters.
This is the error log of xsltproc:

encoding error : input conversion failed due to input error, bytes 0x8F 0x6E 0x65 0x73
encoding error : input conversion failed due to input error, bytes 0x8F 0x6E 0x65 0x73
I/O error : encoder error
AirCanada.xml:1059: parser error : AttValue: ' expected
AirCanada.xml:1059: parser error : attributes construct error
AirCanada.xml:1059: parser error : Couldn't find end of Start Tag SITE line 1059
AirCanada.xml:1059: parser error : Premature end of data in tag DATAFILE line 50
unable to parse AirCanada.xml

Sanitizing the AirCanada.xml file with the solution from this SO answer makes the data usable, but probably with some losses.

iconv -f utf-8 -t utf-8 -c AirCanada.xml > AirCanadaSanitized.xml

Then you can process AirCanadaSanitized.xml with an XSLT processor.

Community
  • 1
  • 1
zx485
  • 28,498
  • 28
  • 50
  • 59