1

In our webapp, we have to send a POST request via HttpClient to an endpoint on our network, which will receive this and do some work with it. We are having trouble with character encoding, and I am having difficulties finding an answer to my question.

We have used the postMethod.getParams().setContentCharset("UTF-8") method when sending the request, but on the receiving end, it seems like the characters are still encoded in ISO 8859-1. I have determined this because when I inspect the String on the receiving side, it has garbage characters in it that go away once I follow the steps found at https://stackoverflow.com/a/16549329/1130549. Is there any extra steps I need to take on the sending end to ensure that I am actually writing characters in UTF-8 as expected? All we are doing now is using postMethod.addParameter(paramKey, paramValue) with native String objects.

Edit: Here is a very simple example of how we're sending the POST request. For what it's worth, the values are being taken from an XMLBeans object.

PostMethod postMethod = new PostMethod(url);
postMethod.getParams().setContentCharset("UTF-8");
postMethod.addParameter("key1", "value1");
postMethod.addParameter("key2", "value2");

HttpClient httpClient = new HttpClient();
int status = httpClient.executeMethod(postMethod);
Mirrana
  • 1,601
  • 6
  • 28
  • 66

2 Answers2

0

EDIT Simpler solution is to encode the value

postMethod.addParameter("key1", URLEncoder.encode("value1","UTF-8"));

To encode properly UTF-8, you can execute differently, using StringEntity and NameValuePair, e.g.:

try (CloseableHttpClient httpClient = HttpClients.custom().build()) {
   URIBuilder uriBuilder = new URIBuilder(url);
   HttpHost target = new HttpHost(uriBuilder.getHost(), uriBuilder.getPort(), uriBuilder.getScheme());
   List<NameValuePair> nameValuePairs = new ArrayList<>();
   nameValuePairs.add(new BasicNameValuePair("key1", "value1"));
   nameValuePairs.add(new BasicNameValuePair("key2", "value2"));
   String entityValue = URLEncodedUtils.format(nameValuePairs, StandardCharsets.UTF_8.name());
   StringEntity entity = new StringEntity(entityValue, StandardCharsets.UTF_8.name());
   post.setEntity(entity);
   httpClient.execute(target, post);
Ori Marko
  • 56,308
  • 23
  • 131
  • 233
  • But isn't this using the request body instead of the request parameters? This is a completely different implementation. – Mirrana Nov 18 '19 at 13:42
  • @agent154 See `postMethod.addParameter` https://hc.apache.org/httpclient-3.x/apidocs/org/apache/commons/httpclient/methods/PostMethod.html#addParameter(java.lang.String,%20java.lang.String) *Adds a new parameter to be used in the POST request body.* – Ori Marko Nov 18 '19 at 13:45
  • @agent154 simpler solution is to encode the value `postMethod.addParameter("key1", URLEncoder.encode("value1","UTF-8"));` – Ori Marko Nov 18 '19 at 13:52
0

First of all, you do need to make sure that the string that you are actually writing is encoded in UTF-8. I realized that you already know that but still double-check that it is so, as it would be the prime suspect of your problem. Also, I would recommend trying a much simpler HTTP client. Apache HTTP client (I believe that's the library that you are using) is an excellent library. But due to covering a very wide range of options it tends to be a bit bulky. So, or simple requests I would suggest a lightweight HTTP client that maybe not that comprehensive as Apache library but offers simplicity as a trade-off. Here how your code may look like:

    private static void testHttpClient() {
        HttpClient client = new HttpClient();
//      client.setContentType("text/html; charset=utf-8");
        client.setContentType("application/json; charset=utf-8");
        client.setConnectionUrl("http://www.my-url.com");
        String content = null;
        try {
            String myMessage = getMyMessage() // get the string that you want to send
            content = client.sendHttpRequest(HttpMethod.POST, myMessage);
        } catch (IOException e) {
            content = client.getLastResponseMessage() + TextUtils.getStacktrace(e, false);
        }
        System.out.println(content);
    }

It looks much more simple, I think. Also in the same library, there is another utility that allows you to convert any string in any language into a sequence of unicodes and vice-versa. This helped me numerous times to diagnose encoding thorny issues. For instance, if you see some gibberish symbols that could be a wrong display of a valid character or actual character loss. Here is an example of how it works:

result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);

The output of this code is:

\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World

That might help you to check if the string you passed is valid or not. The library is called MgntUtils and could be found at Maven Central or at Github It comes as maven artifact and with sources and Javadoc. Javadoc could be found separately here
Disclaimer: The MgntUtils library is written by me

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36