3

I have a webapp in eclipse juno - when I hit Run on server runs fine - either inside eclipse's browser (I am on windows) or in FF.

Right click > export war > dump this into $CATALINA_HOME/webapps > all is working fine (got unpacked alright) EXCEPT

  • my custom tags - I had a WEB-INF\functions.tld file which is apparently not read. The only difference between the auto-generated eclipse server.xml (in Servers project) and the default Tomcat server.xml was the line :

    <Context docBase="ted2012" path="/ted2012" 
    reloadable="true"source="org.eclipse.jst.jee.server:ted2012"/>
    

source being a WTP specific attribute.
This I managed to solve - see my answer

  • Tomcat won't get the Url correctly through - see the pics in my answer.

Questions :

  1. (Unsolved) Why Tomcat does not decode the Url correctly - while eclipse does ? Where is the failure ? Do see my specific question for this for extensive details on the call stack and where exactly tomcat fails
  2. Why did not tomcat see the tld in the first place while eclipse did ? Why did I have to edit the web.xml ? (worked around in my answer, should be another question)

The code is in github - in the file INSTRUCTIONS.txt there are detailed instructions to set the project up and reproduce the bug pictured in my answer below.

Tomcat 7.0.32, eclipse 4.2, java 1.7.9

Community
  • 1
  • 1
Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • It's hard to give an answer as your whole URL encoding/decoding library is utterly superfluous if you just 1) set JSP's page encoding to UTF-8, 2) set Tomcat's URI encoding to UTF-8 and 3) use JSTL ``/`` to construct encoded URLs in JSP whenever applicable (in other words, when you just follow the standard practices as to dealing with UTF-8 data, see further also http://balusc.blogspot.com/2009/05/unicode-how-to-get-characters-right.html to understand it better). – BalusC Mar 31 '13 at 00:29
  • @BalusC I was trying not having to edit the server's xml - it should work - at least it works just I expect it in Eclipse - this one really escapes me. Btw I placed the bounty on this one erroneously : the correct one is http://stackoverflow.com/questions/14914983/webapp-behaves-as-expected-when-run-from-eclipse-while-when-exported-as-war-fail – Mr_and_Mrs_D Mar 31 '13 at 03:04

2 Answers2

5

For decoding URIs correctly, you need URIEncoding connector attribute in Tomcat:

<connector ... URIEncoding="UTF-8" ... />

See my rant https://stackoverflow.com/a/15587140/995876

So it doesn't come with normal code, you need it in the application server configuration separately or use an application server that defaults to UTF-8. There is no way to affect this from code unfortunately.

Drop the decodeRequest and never use new String/getBytes without explicit encoding argument.


Alternative.

If you can't edit the server connector configuration, you can fix your code by providing the encoding explicitly to new String:

public static String decodeRequest(String parameter) {
     return new String(parameter.getBytes("iso-8859-1"), "UTF-8");
}
Community
  • 1
  • 1
Esailija
  • 138,174
  • 23
  • 272
  • 326
  • Thanks - but why it goes through rightly in eclipse and not in exported war in tomcat ? Actually _this is the thing I was trying to debug and I need an answer on this_. Suppose one can't edit the server configuration. See my other question for details on my method : http://stackoverflow.com/questions/14914983/webapp-behaves-as-expected-when-run-from-eclipse-while-when-exported-as-war-fail. Actually it should work (it does in eclipse anyway) - so where is the failure ? Notice the url is spelled correctly – Mr_and_Mrs_D Apr 03 '13 at 12:39
  • @Mr_and_Mrs_D AFAIK, the war just has the application, not the *server* configuration. – Esailija Apr 03 '13 at 12:42
  • But I do not mess with the server configuration ! That's the point of the way I do it ! The code is in github - maybe give it a whirl ? – Mr_and_Mrs_D Apr 03 '13 at 12:43
  • @Mr_and_Mrs_D this is the line in config where you would place the URIEncoding attribute https://github.com/Utumno/ted2012/blob/GitHub2/conf/Servers_eclipse_project_tomcat7_conf/server.xml#L70 – Esailija Apr 03 '13 at 13:17
  • I know - I have done it this way - I know how to do it this way - I don't understand why the `URLDecoder.decode(new String(parameter.getBytes("iso-8859-1")), CHARSET_FOR_URL_ENCODING);` - behaves fine in Eclipse deploy and not in war deploy - believe me I have not put a bounty to be told I have to change the Connector! I repeat see : http://stackoverflow.com/questions/14914983/webapp-behaves-as-expected-when-run-from-eclipse-while-when-exported-as-war-fail – Mr_and_Mrs_D Apr 03 '13 at 13:25
  • @Mr_and_Mrs_D the URIEncoding is for so that you can just do `String name = request.getParameter("name")` without further plumbing. As for your current code, you are using `new String` without encoding parameter, which defaults to platform default encoding, which seems to be different when you launch with eclipse vs tomcat. – Esailija Apr 03 '13 at 13:28
  • Where is the difference set ? Read the question - I compared the configs. Notice I ask 2 questions btw – Mr_and_Mrs_D Apr 03 '13 at 13:30
  • @Mr_and_Mrs_D it's not about the config but the platform default encoding because you are using `new String` without explicit encoding. Try to print `Charset.defaultCharset()` in both environments - it should be different. – Esailija Apr 03 '13 at 13:31
  • Did try this and tomcat printed windows-1252 while eclipse printed UTF-8 (edited my _answer_) - actually when I go to prefs > workspace > text file encoding and set the default (Cp1252) _**Eclipse also fails**_ - so yeah, an explanation on it (and a workaround preferably) and some info on why do I need to edit the web.xml for tomcat to see the tld and we're there :) – Mr_and_Mrs_D Apr 04 '13 at 12:19
  • @Mr_and_Mrs_D There are solutions in my answer, the fixed `decodeRequest` is a work-around while the connector attribute is a real fix that just works without any plubming in code. – Esailija Apr 04 '13 at 12:27
  • `return URLDecoder.decode(new String(parameter.getBytes("iso-8859-1"), CHARSET_FOR_URL_ENCODING), CHARSET_FOR_URL_ENCODING);` did it - add something in your answer for the tld encoding - and if you could recommend me a [s]link[/s], [s]book[/s], clinic for the java charset experience... – Mr_and_Mrs_D Apr 04 '13 at 13:31
  • @Mr_and_Mrs_D The urldecoder is superfluous, you can do it with `return new String(parameter.getBytes("iso-8859-1"), CHARSET_FOR_URL_ENCODING);` – Esailija Apr 04 '13 at 14:44
  • Yes true (despite having first url encoded the parameter (?)) Btw - does `request.getParameter` call `URL.decode()` ? – Mr_and_Mrs_D Apr 04 '13 at 15:05
  • @Mr_and_Mrs_D yes the servlet automatically urldecodes, there is no urldecoding needed to be done by you. Just the ISO > UTF8 conversion. – Esailija Apr 04 '13 at 15:38
  • I will be asking a question for `parameter.getBytes("iso-8859-1")` - I am trying to locate where in the tomcat source does this requirement for "iso-8859-1" comes from - should I hardcode it like this or is liable to change ? Meanwhile please try a guess for the second question - this bounty is ending soon :) – Mr_and_Mrs_D Apr 05 '13 at 08:52
  • @Mr_and_Mrs_D `new String(parameter.getBytes("iso-8859-1"), "UTF-8")` is a backwards hack to overcome the incorrect decoding done by servlet automatically. It wouldn't even be possible if the encoding wasn't ISO-8859-1 because ISO-8859-1 is the only encoding to fully retain the original binary data, with exact byte<->codepoint matches. The servlet will incorrectly decode any query string (but not POST params, see the rant linked in my answer) always as ISO-8859-1. You don't need this at all if you just set `URIEncoding="UTF-8"` in tomcat server.xml configuration connector element. – Esailija Apr 05 '13 at 11:01
  • " It wouldn't even be possible if the encoding wasn't ISO-8859-1 because ISO-8859-1 is _the only encoding to fully retain the original binary data_, with exact byte<->codepoint matches." Thanks ! - I will be posting a question investigating this further - I want to be able to do this in code. Please add some info on the need for editing the web.xml for tomcat to see the tld in your answer so I can accept it :) – Mr_and_Mrs_D Apr 05 '13 at 16:02
  • @Mr_and_Mrs_D I don't know much about Tomcat to know that sorry, just encodings :P – Esailija Apr 05 '13 at 16:04
1

One thing that helped was to add to the web-xml :

<jsp-config>
    <taglib>
        <taglib-uri>
            functions
        </taglib-uri>
        <taglib-location>
            functions.tld
        </taglib-location>
    </taglib>
</jsp-config>

Now tomcat (7.0.30) sees my taglib which is used to encode URIs.


Strange thing : when I print the username with system out I get ???? in tomcat's console instead of hieroglyphs. Maybe this points to the issue ? In my controller I have :

final String username = Helpers.decodeRequest(request
                .getParameter("user"));
System.out.println("ProfileController.doGet() user name DECODED : "
                                + username);

where :

private static final String CHARSET_FOR_URL_ENCODING = "UTF-8";

public static String decodeRequest(String parameter)
        throws UnsupportedEncodingException {
    System.out.println(Charset.defaultCharset()); // EDIT: suggested by @Esailija
    if (parameter == null)
        return null;
    System.out.println("decode - request.getBytes(\"iso-8859-1\"):"
            + new String(parameter.getBytes("iso-8859-1")));
    System.out.println("decode - request.getBytes(\"iso-8859-1\") BYTES:"
            + parameter.getBytes("iso-8859-1"));
    for (byte iterable_element : parameter.getBytes("iso-8859-1")) {
        System.out.println(iterable_element);
    }
    System.out.println("decode - request.getBytes(\"UTF-8\"):"
            + new String(parameter.getBytes(CHARSET_FOR_URL_ENCODING))); // UTF-8
    return URLDecoder.decode(new String(parameter.getBytes("iso-8859-1")),
            CHARSET_FOR_URL_ENCODING);
}

So tomcat :

windows-1252 // EDIT: suggested by @Esailija
decode - request.getBytes("iso-8859-1"):╬╡╬╗╬╗╬╖╬╜╬▒╧?╬▒
decode - request.getBytes("iso-8859-1") BYTES:[B@d171825
-50
-75
-50
-69
-50
-69
-50
-73
-50
-67
-50
-79
-49
-127
-50
-79
decode - request.getBytes("UTF-8"):├Ä┬╡├Ä┬╗├Ä┬╗├Ä┬╖├Ä┬╜├Ä┬▒├?┬?├Ä┬▒
ProfileController.doGet() user name DECODED : ╬╡╬╗╬╗╬╖╬╜╬▒╧?╬▒
???????? // user Dao System.out.println("ελληναρα");
com.mysql.jdbc.JDBC4PreparedStatement@67322bd9: SELECT * FROM users WHERE username='╬╡╬╗╬╗╬╖╬╜╬▒╧?╬▒'
ProfileController.doGet() user : null

Eclipse :

UTF-8 // EDIT: suggested by @Esailija
decode - request.getBytes("iso-8859-1"):ελληναρα
decode - request.getBytes("iso-8859-1") BYTES:[B@44c353ae
-50
-75
-50
-69
-50
-69
-50
-73
-50
-67
-50
-79
-49
-127
-50
-79
decode - request.getBytes("UTF-8"):ελληναÏα
ProfileController.doGet() user name DECODED : ελληναρα
ελληναρα // user Dao System.out.println("ελληναρα");
com.mysql.jdbc.JDBC4PreparedStatement@73aae7c6: SELECT * FROM users WHERE username='ελληναρα'
ProfileController.doGet() user : com.ted.domain.User@4b22015d

EDIT : if I change the eclipse encoding in prefs > workspace > text file encoding and set the default (Cp1252)

windows-1252
decode - request.getBytes("iso-8859-1"):λαλακης
decode - request.getBytes("iso-8859-1") BYTES:[B@5ef1946a
-50
// same bytes ....
decode - request.getBytes("UTF-8"):λαλακη�‚
ProfileController.doGet() user name DECODED : λαλακης
ελληναÏ?α
com.mysql.jdbc.JDBC4PreparedStatement@4646ebd8: SELECT * FROM users WHERE username='λαλακης'
ProfileController.doGet() user : null

and Eclipse also fails


NB : Tomcat does print the correct url in the address bar

enter image description here

Eclipse is fine :

enter image description here

Notice that Firefox automatically decodes the Url (to my bewilderment)

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361