5

I have an xml document with an & in it, so I am getting an error

   [Error: Invalid character in entity name
   Line: 155
   Column: 63
   Char:  ]

I wrote a function to escape illegal xml characters:

const escapeIllegalCharacters = (xml) => {
  xml = xml 
    .replace(/&/g,'&')
    .replace(/"/g, '"')
    .replace(/'/g, ''')
    .replace(/>/g, '>')
    .replace(/</g, '&lt;');
  return (xml);
}

And put it into a valueProcessor:

return parse.parseString(xml, {valueProcessors: [escapeIllegalCharacters]});

But I'm still getting the same error. Is this the wrong way to escape characters using the xml2js module?

Dan
  • 2,647
  • 2
  • 27
  • 37

1 Answers1

7

You need to escape the ampersands before calling parseString.

You can use the regular expression from this answer to escape ampersands that themselves are not part of an espace sequence:

return parse.parseString(
  xml.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;')
);

Whether or not this will solve your problem will depend upon the content of your XML. For example, this simplistic mechanism will also escape any ampersands in CDATA sections, but ampersands in those sections should be left unescaped.

Community
  • 1
  • 1
cartant
  • 57,105
  • 17
  • 163
  • 197
  • Thanks for the reply! So I actually tried to run my escape function before calling parseString, but then I ran into the problem you mentioned where I was escaping the opening and closing tags of the xml characters themselves. I think your above solution would work for ampersands but is there a general solution for escaping all illegal characters only between tags? I thought the valueProcessors solution would be it... – Dan Jan 27 '17 at 21:46
  • The bottom line is that your XML is invalid and that parser is not going to accept it. Perhaps there is another that is more forgiving? If you have unescaped `<` and `>` characters in there, too, you have a real problem. Any parser would need way of determining whether or not something is a tag, etc. So there would always be ambiguities and no general solution. – cartant Jan 27 '17 at 21:52
  • Yeah fair point, I checked your answer and it works for ampersands, I'm going to mark it correct. Thanks! – Dan Jan 27 '17 at 21:53