4

I have the following string in Javascript and need to remove the <?xml ... ?> and <!DOCTYPE .... ]> tags. Can not convert it to a dom because the BR tags error as not being closed - and not able to edit the actual content.

  <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html [<!ENTITY amp "&#38;#38;">]><div>Blah<br> Blah</div>

Trying to do it with .replace but can't quite seem to get there

    text.replace(/\<\?xml.+\?\>/g, '');
Jared Farrish
  • 48,585
  • 17
  • 95
  • 104
Louis W
  • 3,166
  • 6
  • 46
  • 76

3 Answers3

8

Your replace() works for the <?xml ... ?> part.

To remove the <!DOCTYPE .... ]> part as well you can do:

text.replace(/\<\?xml.+\?\>|\<\!DOCTYPE.+]\>/g, '');

As you can see here: http://jsfiddle.net/darkajax/9fKnd/1/

DarkAjax
  • 15,955
  • 11
  • 53
  • 65
5

you can use this regex:

text.replace(/\<(\?xml|(\!DOCTYPE[^\>\[]+(\[[^\]]+)?))+[^>]+\>/g, '');

that works with :

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html [<!ENTITY amp "&#38;#38;">]><div>Blah<br> Blah</div>

<?xml version="1.0" encoding="UTF-8"?><div>Blah<br> Blah</div>

<!DOCTYPE html [<!ENTITY amp "&#38;#38;">]><div>Blah<br> Blah</div>
Thomas Durieux
  • 466
  • 2
  • 5
0

The accepted answer has unnecessary escaping (extra back slashes, making an ugly regex uglier), this works too:

const text = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html [<!ENTITY amp "&#38;#38;">]><div>Blah<br> Blah</div>'

console.log(text)

const afterReplace = text.replace(/<\?xml.+\?>|<!DOCTYPE.+]>/g, '')

console.log(afterReplace)
MattG
  • 5,589
  • 5
  • 36
  • 52