0

I'm trying to insert an xml data file into R and get a data frame. I'm using package 'XML' and xmlToDataFrame("test.xml") command. This is giving me the following error: xmlParseCharRef: invalid xmlChar value 26.

Now, from my research online there's probably something going on in the xml file. I've tried replacing all escaping characters e.g. & with & I even replaced Ó with O (although it shouldn't make a difference but just to be sure). It didn't work. The xml data file has over 2million rows so it is impossible to go through it line by line.

Does anyone have any idea on what other character could be causing me the problem?

I should also mentioned that the encoding on the file was <?xml version="1.0" encoding="UTF-8"?> but I've also tried <?xml version=''1.0'' encoding=''iso-8859-1''?> and <?xml version="1.0" encoding="ascii"?>. However, I have no idea what this means, but people were suggesting it online. Any help would be greatly appreciated!

Example of xml data:

<?xml version="1.0" encoding="UTF-8"?>
<data>
<new_buildings>
<new_building>
<new_building_shipyard_name value="189 (189 COMPANY)"/>
<new_building_bv_number value="29"/>
<new_building_ship_type value="boat"/>
<new_building_commercial_owner_name value="SHIPYARDS"/>
<new_building_registered_owner_code value="18"/>
<new_building_keel_laying_date value="2013-08-14"/>
<new_building_confidentiality_indicator value="N"/>
</new_building>
<new_building>
NinaS
  • 3
  • 5
  • Hard to say without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), such as a valid snippet of xml – camille Aug 20 '19 at 14:47
  • @camille I can tell you exactly what I do in R and how I import the file, however the actual xml file is 2 million rows and from a first look there is nothing wrong with it. I'm asking if there are any characters you might be aware cannot be read in xml or if there a process to validate that xml as I didn't write it. – NinaS Aug 20 '19 at 14:55
  • Have you read it into a different format that might not be as strict as what's needed to construct a data frame? Without seeing a sample, we're just guessing as to how it's structured and what the problem could be – camille Aug 20 '19 at 15:01
  • 1
    Those characters have to be escaped with xml : ``<,>, &, ", ' ``, have you done all of them and yes the ``Ó`` should not be a problem here. – Gainz Aug 20 '19 at 15:17
  • @camille added example of how the xml data looks like – NinaS Aug 20 '19 at 15:22
  • @Gainz yes edited all of them – NinaS Aug 20 '19 at 15:22
  • Those are the 5 characters (if we don't count the unprintable characters) that need to be escaped tho, as far as I know you should not have problems with other characters with ``xml``. Also, does the data come from SQL? – Gainz Aug 20 '19 at 15:28
  • @Gainz I don't know where the data comes from I'm afraid. I just received it from our data partner and I though that by adding it into R I could easily get a table view of it. But apparently.... – NinaS Aug 21 '19 at 10:28

0 Answers0