2

There are so many posts for this question. I have gone through all of them but I didn't get the solution as I expected. I need to convert special characters in XML into html entities.

I tried,

<?php
$xml="<test>This is a xml file which has special characters < > & in it</test>";
htmlspecialchars($xml, ENT_XML1, 'UTF-8');//it replaces xml tags too
?>

Expected output XML string:

<test>This is a xml file which has special characters &lt; $gt; &amp; in it</test>
hakre
  • 193,403
  • 52
  • 435
  • 836
vidhya
  • 449
  • 7
  • 22

1 Answers1

1

There are so many posts for this question. I have gone through all of them but I didn't get the solution as I expected.

Yes, the topic you think you ask about is well defined and also well covered on this website already. However, this does not protect from making mistakes as it happens to the best of use day for day.

You write in your code example:

This is a xml file which has special characters

And you give the following string:

<test>This is a xml file which has special characters < > & in it</test>

But what you write is wrong. This is not an XML file because it is not well-formed.

So this is the first mistake that happens.

The next mistake you have in your question is that you apply a conversion function on the whole string you have albeit you want to apply it only on small fractions of that string, namely these three characters:

  1. < at offset 54
  2. > at offset 56 (technically this does not need to become &gt;)
  3. & at offset 58

So instead you would need to apply the function on these parts only. Just an exemplary code to demonstrate this, you should not want this as a "solution", this is for demonstration:

foreach ([58, 56, 54] as $offset)
{
    $encoded = htmlspecialchars($xml[$offset], ENT_XML1, 'UTF-8');
    $xml = substr_replace($xml, $encoded, $offset, 1);
}

As this example shows, the encoding function you've selected, was not all wrong, it correctly encodes the characters you asked for:

<test>This is a xml file which has special characters &lt; &gt; &amp; in it</test>

(There are other ways imaginable, for example making use of CData sections <test><![CDATA[This is a xml file which has special characters < > & in it]]></test> but that's not the point here.)

However with all these mistakes and the confusion this creates, it must not mean that on Stackoverflow we don't have existing Q&A material that clearly addresses the topic.

Moving the confusion away by identifying the mistakes made, there is a repertoire on reference material available from which you can pick your weapons of choice:

As you can see there is a larger list of questions and answers and depending on a first analysis what's wrong with your XML that's not XML but could become XML as it's visually close to it you should be able to find that one method you might love most for fixing it.

I'm personally a fan of the Tidy extension in PHP which can do the job you're looking for in your case:

tidy_repair_string($xml, ['input-xml' => 1, 'output-xml' => 1, 'wrap' => 0]);

You might want to do it differently. See the linked questions above as a start for better search terms and to see what others have asked and answered about this topic.

Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836
  • Actually just for an example, I mentioned tat string as an xml..My question is how to convert special characters in a xml file.I will go through the link you suggested – vidhya Jan 02 '15 at 05:07
  • I tried `tidy_repair_string($xml, ['input-xml' => 1, 'output-xml' => 1, 'wrap' => 0]);` and it works as I expected. Thank you so much – vidhya Jan 02 '15 at 08:48