0

I'm trying to reading an XML file which is an export of a website. When I run the following:

result <- xmlParse(file = "~/Desktop/export.xml")

I get:

PCDATA invalid Char value 8
PCDATA invalid Char value 1
PCDATA invalid Char value 8
PCDATA invalid Char value 1
PCDATA invalid Char value 8
PCDATA invalid Char value 1
PCDATA invalid Char value 8
PCDATA invalid Char value 1
PCDATA invalid Char value 8
PCDATA invalid Char value 1
PCDATA invalid Char value 8
PCDATA invalid Char value 1
Error: 1: PCDATA invalid Char value 8

Is there any way I can skip these invalid characters and read it anyway? Or do I have to somehow remove them? I simply want to parse the XML to find URLs within it containing a specific string.

tmnsnmt
  • 95
  • 1
  • 10
  • 1
    You've not posted your XML, but those characters are not allowed in any XML. Fix your data to actually be XML before trying to parse it. – kjhughes Oct 28 '17 at 13:35
  • 1
    How do you do what? Post a big file? You don't. You post a [mcve], and it's your job to make it *minimal*. (But, really, here your problem is clearly invalid characters, so don't bother.) Fix your data? Fix it at the source, or see the duplicate link. – kjhughes Oct 28 '17 at 16:50
  • No need to see rudeness where none exists. – kjhughes Oct 28 '17 at 17:25
  • See duplicate link for how to fix bad XML, including invalid characters. – kjhughes Oct 28 '17 at 17:26

0 Answers0