1

Consider me a newbie using HTML and Java. I would appreciate if you could peek at my issue and offer any suggestions. I am basically trying to do the following in order:

1) Send an HTTP request via POST method in Java using PostMethod class.

2) Fetch result. The result I receive is in HTML format

3) The actual result contains characters like ", ;, : etc. All the quotes are converted to entities (&quot) in the result (htmlOutput string)

The question I have is the following. How do I avoid fetching encoded result. Is there a good way to get the result as original string that does not contain entities (&quot)? Following is the code I use.

        int statusCode = HttpStatus.SC_OK;
        String scriptOutput = "";   
        PostMethod runnerMethod = new PostMethod(url);
        try {
            runnerMethod.setRequestHeader("X-Forwarded-For", LOCAL_MACHINE_IP);
            runnerMethod.addParameter("script", serializedScript);      
            statusCode = client.executeMethod(runnerMethod);
            if (statusCode != HttpStatus.SC_OK) {
                scriptOutput = "HTTP Post request failed with statusCode" + statusCode + 
                                runnerMethod.getStatusText();
                throw new Exception(scriptOutput);
            }
            String htmlOutput = runnerMethod.getResponseBodyAsString();
            scriptOutput = StringUtils.substring(htmlOutput, StringUtils.indexOf(htmlOutput,"Script:") + 8, StringUtils.indexOf(htmlOutput, "<BR/>"));            

            return scriptOutput;
        } catch (IllegalArgumentException e) {
            String errMsg = String.format("Error during Background script execution on instance. opId = %s, instanceUrl = %s, HTTP Status Code = %d, Err Message = %s",
                                          opId, instanceUrl, statusCode, e.getMessage());          
            return errMsg;            
        }
        catch (Exception e)
        {
            String errMsg = String.format("Error during Background script execution on instance. opId = %s, instanceUrl = %s, HTTP Status Code = %d, Err Message = %s",
                                          opId, instanceUrl, statusCode, e.getMessage());            
return errMsg;
        }
        finally {
            runnerMethod.releaseConnection();
        }

The output sample is following:

enter image description here

  • You should explain what do you mean by "converted" in point 3) Are you sure you have no HTML entities ("s etc) in `htmlOutput` variable? – pwes Aug 29 '14 at 07:03
  • @pwes: I clarified a bit more per your suggestion. To answer your question, htmlOutput does contain HTML entities, in this case &quot instead of actual ". Let me know if my question is clear. – animageofmine Aug 29 '14 at 07:12

1 Answers1

1

What you do is posting to an HTTP server. I presume you are using Apache Commons HTTPClient. There is nothing in the getResponseBodyAsString()-method that would escape quotes to HTML entities.

Probably you tried to send a double-JSON-encoded object (thus it was first encoded to the usual representation, and then to a JSON string, which would explain the entities) on the server side.

The correct solution would be to get rid of the double encoding. If you don't control the server side, you can just use .replaceAll("&quot;", "\"")or use Apache Commons StringEscapeUtils as explained elsewhere on StackOverflow.

Community
  • 1
  • 1
llogiq
  • 13,815
  • 8
  • 40
  • 72
  • yes the result server returns is JSON string. And I am using Apache Commons HttpClient. How can I get rid of double encoding when returning the results back from server? BTW, I did think of using replace all, but then that only solves one issue. I am not sure what are the other characters I should replace. I am developing a generic client library, so the results might contain other characters that would be changed to HTML entity. Suggestion? – animageofmine Aug 29 '14 at 07:18
  • As I wrote above, StringEscapeUtils (which is part of apache commons-lang) can unescape all HTML entities. I don't know what is done on the server, so I cannot advise how to get rid of the double-encoding without more information. – llogiq Aug 29 '14 at 07:41
  • Thanks. On the server side, a script (javascript) is executed and the result string is encoded to JSON using JSON encoder found here: https://github.com/douglascrockford/JSON-js. Example: [{"key1":{"subkey1":"value1"}},{"key2":{"subkey2":"value2"}}] – animageofmine Aug 29 '14 at 07:55
  • But this is apparently not what gets sent over the wire, else your question would not even have arisen. What document type do you send the JSON under? Perhaps there is a server component that escapes the HTML entities because you send Content-Type: text/html? – llogiq Aug 29 '14 at 12:54