1

Possible Duplicate:
Java: How to decode HTML character entities in Java like HttpUtility.HtmlDecode?

I have string data with some special characters encoded in this format &#039

in this case that encoding is a ' sign, a single quote.

so example the "the citizen&#039s home" should appear like "the citizen's home" but it does not.

Unfortunately this is not interpreted as such, and I need to parse all of my string for these things and convert them

first: what is that format called, this will help me find a conversion method

second: do you know of a method to fix my strings?

Community
  • 1
  • 1
CQM
  • 42,592
  • 75
  • 224
  • 366

1 Answers1

3

No need to reinvent the wheel: Apache Commons Lang's StringEscapeUtils.unescapeHtml4(String) is what you want.

Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.

For example, the string "&lt;Fran&ccedil;ais&gt;" will become "<Français>"

If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g. "&gt;&zzzz;x" will become ">&zzzz;x".

Otto Allmendinger
  • 27,448
  • 7
  • 68
  • 79