4

I am working with an XML feed that has, as one of it's nodes, a URL string similar to the following:

http://aflite.co.uk/track/?aid=13414&mid=32532&dl=http://www.google.com/&aref=chris

I understand that ampersands cause a lot of problems in XML and should be escaped by using & instead of a naked &. I therefore changed the php to read as follows:

<node><?php echo ('http://aflite.co.uk/track/?aid=13414&amp;mid=32532&amp;dl=http://www.google.com/&amp;aref=chris'); ?></node>

However when this generates the XML feed, the string appears with the full &amp; and so the actual URL does not work. Apologies if this is a very basic misunderstanding but some guidance would be great.

I've also tried using %26 instead of &amp; but still getting the same problem.

skaffman
  • 398,947
  • 96
  • 818
  • 769
Chris
  • 57
  • 1
  • 1
  • 7
  • 2
    `&` is correct. If the parser isn't decoding it when it converts the XML node value to a string, then the parser is broken and you need to fix that and not the XML. – Quentin Jun 29 '11 at 10:57
  • possible duplicate of [Remove &amp from string when writing to xml in PHP](http://stackoverflow.com/questions/6379283/remove-amp-from-string-when-writing-to-xml-in-php) – Gordon Jun 29 '11 at 11:13
  • You should not create XML with plain `echo` statements. Instead, use either SimpleXML or XMLWriter: they will all take care of the dirty details for you. – Álvaro González Jun 29 '11 at 11:18

2 Answers2

8

If you are inserting something into XML/HTML you should always use the htmlspecialchars function. this will escape your strings into correct XML syntax.

but you are running into a second problem. your have added a second url to the first one. this need also escaped into url syntax. for this you need to use urlencode.

<node><?php echo htmlspecialchars('http://aflite.co.uk/track/?aid=13414&mid=32532&aref=chris&dl='.urlencode('http://www.google.com/')); ?></node>
coding Bott
  • 4,287
  • 1
  • 27
  • 44
  • if he uses `htmlspecialchars()` on a string which has already got `&` escaping, he'll end up with a double-escaped string. – Spudley Jun 30 '11 at 09:54
  • yes his samples code doesnt work. i corrected my sample when i saw the second appended url. – coding Bott Jun 30 '11 at 09:57
  • you've updated the answer, so that's good; no more double-escaping. But interestingly, I note the `urlencode()` part that you've added: there's nothing in the original question which specifies that `aref=chris` is part of the google URL rather than another argument of the main URL. If it is, then your answer is good and also explains why it wasn't working for him despite escaping. However, there's stil a problem, in that if `aref=chris` is part of the google URL then it needs to be preceded by a question mark, rather than an ampersand, as well as being run through `urlencode()`. – Spudley Jun 30 '11 at 10:04
  • yes, you are right. &ahref isnt part of the google url. the google url an contain any char he need within his url. this includes also a ?. – coding Bott Jun 30 '11 at 10:11
  • +1. I'll vote you up anyway though because there are some good points raised here which I didn't have in my answer. :) – Spudley Jun 30 '11 at 10:13
5

&amp; is correct for escaping ampersands in an XML document. The example you've given should work.

You state that it doesn't work, but you haven't stated what application you're using, or in what way it doesn't work. What exactly happens when you click the link? Do the &amp; strings end up in the browser's URL field? If that's the case, it sounds like a fault with the software you've viewing the XML with. Have you tried looking at the XML in another application to see if the problem is consistent?

To answer the final part of your question: %26 would definitely not work for you -- this would be what you'd use if your URL parameters needed to contain ampersands. Say for example in aref=chris, if the name chris were to an ampersand (lets say the username was chris&bob), then that ampersand would need to be escaped using %26 so that the URL parser didn't see it as starting a new URL parameter.

Hope that helps.

Spudley
  • 166,037
  • 39
  • 233
  • 307