0

I am having an issue unescaping special characters in Java encoded with the Javascript escape() method.

Chrome console:

escape( "Gaëtan" )
"Ga%EBtan"

Java side:

(new org.apache.commons.codec.net.URLCodec()).decode("Ga%EBtan", "UTF-8")
 Ga�tan
java.net.URLDecoder.decode( "Ga%EBtan", "UTF-8" )
 Ga�tan

None of the methods in org.apache.commons.lang3.StringEscapeUtils can decode the string either.

The code that this is going for is married to the escape() method since it was written a very long time a go. I cannot change it without investing a serious amount of work so if I can avoid it, I want to.

The only thing that does work, but this is a performance hit:

( new javax.script.ScriptEngineManager() ).getEngineByName("JavaScript").eval( "unescape('Ga%EBtan')" )
Gaëtan

Any ideas? :)

Assaf Moldavsky
  • 1,681
  • 1
  • 19
  • 30
  • I solved this problem at one point and ended up using `URLEncoder` and `URLDecoder` with a handful of special-cased replacements I figured out from reading the docs. Those classes handle a few things differently than Javascript, but it's close enough you could get it to work. Unfortunately I don't have the code handy. Maybe you can find a similar open-source version. – Stephen Rosenthal Jan 28 '16 at 21:27

1 Answers1

2

The problem is that escape() is not encoding it in UTF-8, that is, the bytes encoded by %EB are not the UTF-8 character ë.

You need to decode it in this case with Windows-1252:

new URLCodec().decode("Ga%EBtan", "Windows-1252");

Edit: Answers in this question suggest using encodeURI and encodeURIComponent in javascript, since the encoding for escape() seems to be variable. Those two always encode in UTF-8.

Edit 2: Here's another related question. In short, don't use escape().

Community
  • 1
  • 1
ecarlos
  • 209
  • 1
  • 6
  • Ha! i will try this. If it was up to me I would have never used escape(). That API is deprecated as far as I know. The case is that I cannot swap it easily now without doing a very significant rework of what is existing. – Assaf Moldavsky Jan 28 '16 at 23:15