0

I am analyzing an XML file using NSXMLParser, which works great but sometimes gives me inaccurate results.

For example, I get this URL:

http://www.thehungersite.com/clickToGive/home.faces;jsessionid=01F974DC9E276DA587AE299175EDF4F4.ctgProd02?siteId=4&link=ctg_trs_home_from_ths_home_sitenav

but NSXMLParser outputs:

http://www.thehungersite.com/clickToGive/home.faces;jsessionid=01F974DC9E276DA587AE299175EDF4F4.ctgProd02?siteId=4&link=ctg_trs_home_from_ths_home_sitenav

For some reason, it ignored the #38; part of the string - how can I get this back? Assuming this was HTML encoding I have tried stringByAddingPercentEscapesUsingEncoding: but that will not work.

Any ideas?

Pripyat
  • 2,937
  • 2
  • 35
  • 69

3 Answers3

1

XML uses the same character reference encoding mechanism as HTML (although it has only 5 predefined named entities, as opposed to the huge number defined for HTML). & is an encoding for the & character.

Anomie
  • 92,546
  • 13
  • 126
  • 145
  • Yes, `&` is being retained - how do I convert it back? – Pripyat May 16 '11 at 16:16
  • @DavidSchieer: See [HTML entity encoding (convert '<' to '<') on iPhone in objective-c](http://stackoverflow.com/questions/1666717/html-entity-encoding-convert-to-lt-on-iphone-in-objective-c) – Anomie May 16 '11 at 16:31
0

Perhaps the top answer to this question might help: Objective-C: How to replace HTML entities?

It's basically a category to NSString someone made that offers both encoding and decoding of HTML entities in NSString.

Community
  • 1
  • 1
Henri Normak
  • 4,695
  • 2
  • 23
  • 33
  • Is there a way to stop NSXMLParser from converting it in the first place? – Pripyat May 16 '11 at 16:36
  • The NSXMLParserDelegate has methods for detecting internal entities and external entities, but I'm not sure whether HTML entities will fit under there http://developer.apple.com/library/mac/#documentation/Cocoa/Reference/NSXMLParserDelegate_Protocol/Reference/Reference.html%23//apple_ref/occ/intf/NSXMLParserDelegate – Henri Normak May 16 '11 at 16:37
-1

You're using the ISO standard. Try either using %26 or by using url encoding.

Ian
  • 738
  • 5
  • 13
  • Take a look at http://stackoverflow.com/questions/1812473/difference-between-url-encode-and-html-encode/1812486#1812486 for more on HTML encoding vs. URL encoding. Also, %26 is the encoded text for the '&' when it is urlencoded. – Ian May 16 '11 at 16:23
  • Thank you - I was aware of this. The problem I am facing is that I cannot convert it back to the original value. – Pripyat May 16 '11 at 16:26