0

I am writing an HTTP server, and to test what messes it up, I entered ઔஇᆖ into a text field. The client request is

GET /add_text_data?message=%E0%AA%94%E0%AE%87%E1%86%96&category=log&color=black HTTP/1.1
Host: localhost
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8

When I used URLDecodeer.decode("%E0%AA%94%E0%AE%87%E1%86%96", "UTF-8"), I got ???. How do I fix this?

Lucas
  • 132
  • 6
  • You did everything correct. Your trouble is you do not have ``-Dfile.encoding=utf-8 `` set, so java does not know how to present UTF-8 chars – user996142 Aug 12 '15 at 19:03
  • `-Dfile.encoding` determines the _default_ charset encoding. Since OP is explicitly specifying the charset to use, it doesn't matter what the default value is. – Mick Mnemonic Aug 12 '15 at 19:12

2 Answers2

1

It turns out that this is actually not an issue with URLDecoder, but with OutputStream. URLDecodeer.decode("%E0%AA%94%E0%AE%87%E1%86%96", "UTF-8").equals("ઔஇᆖ") is actually true. I just needed to set Eclipse to accept UTF-8 output. This question fixed it for me.

Community
  • 1
  • 1
Lucas
  • 132
  • 6
0

Looks like UTF-8 can't handle it.

You can test decoding here to see what kind of decoding you'd have to use.

http://encoder.mattiasgeniar.be/index.php

Make sure you store the result in some kind of datatype that can accept unicode if you do.

user3657661
  • 306
  • 3
  • 13
  • Strings in java do accept unicode. But java needs encoding to be provided to output it correctly (to set appropriate encoding to stdout stream). It should be done by ``file.encoding`` system property – user996142 Aug 12 '15 at 19:06