9

I use the below function to retrieve the web service response:

private String getSoapResponse (String url, String host, String encoding, String soapAction, String soapRequest) throws MalformedURLException, IOException, Exception {         
    URL wsUrl = new URL(url);     
    URLConnection connection = wsUrl.openConnection();     
    HttpURLConnection httpConn = (HttpURLConnection)connection;     
    ByteArrayOutputStream bout = new ByteArrayOutputStream(); 

    byte[] buffer = new byte[soapRequest.length()];     
    buffer = soapRequest.getBytes();     
    bout.write(buffer);     
    byte[] b = bout.toByteArray();          

    httpConn.setRequestMethod("POST");
    httpConn.setRequestProperty("Host", host);

    if (encoding == null || encoding == "")
        encoding = UTF8;

    httpConn.setRequestProperty("Content-Type", "text/xml; charset=" + encoding);
    httpConn.setRequestProperty("Content-Length", String.valueOf(b.length));
    httpConn.setRequestProperty("SOAPAction", soapAction);

    httpConn.setDoOutput(true);
    httpConn.setDoInput(true);

    OutputStream out = httpConn.getOutputStream();
    out.write(b); 
    out.close();

    InputStreamReader is = new InputStreamReader(httpConn.getInputStream());
    StringBuilder sb = new StringBuilder();
    BufferedReader br = new BufferedReader(is);
    String read = br.readLine();

    while(read != null) {
        sb.append(read);
        read = br.readLine();
    }

    String response = decodeHtmlEntityCharacters(sb.toString());    

    return response = decodeHtmlEntityCharacters(response);
}

But my problem with this code is it returns lots of special characters and makes the structure of the XML invalid.
Example response:

<PLANT>A565</PLANT>
          <PLANT>A567</PLANT>
          <PLANT>A585</PLANT>
          <PLANT>A921</PLANT>
          <PLANT>A938</PLANT>
        </PLANT_GROUP>
      </KPI_PLANT_GROUP_KEYWORD>
      <MSU_CUSTOMERS/>
    </DU>
    <DU> 

So to solve this, I use the below method and pass the whole response to replace all the special characters with its corresponding punctuation.

private final static Hashtable htmlEntitiesTable = new Hashtable();
static {
    htmlEntitiesTable.put("&","&");
    htmlEntitiesTable.put(""","\"");
    htmlEntitiesTable.put("&lt;","<");
    htmlEntitiesTable.put("&gt;",">");  
}

private String decodeHtmlEntityCharacters(String inputString) throws Exception {
    Enumeration en = htmlEntitiesTable.keys();

    while(en.hasMoreElements()){
        String key = (String)en.nextElement();
        String val = (String)htmlEntitiesTable.get(key);

        inputString = inputString.replaceAll(key, val);
    }

    return inputString;
}

But another problem arised. If the response contains this segment &lt;VALUE&gt;&lt; 0.5 &lt;/VALUE&lt; and if this will be evaluated by the method, the output would be:

<VALUE>< 0.5</VALUE>

Which makes the structure of the XML invalid again. The data is correct and valid "< 0.5" but having it within the VALUE elements causes issue on the structure of the XML.

Can you please help how to deal with this? Maybe the way I get or build the response can be improved. Is there any better way to call and get the response from web service?

How can I deal with elements containing "<" or ">"?

yonan2236
  • 13,371
  • 33
  • 95
  • 141
  • so you need a way to detect if a '<' is data or syntax? – Cruncher Oct 16 '13 at 13:52
  • I suggest having an attribute on VALUE. Make the attribute a 1 or a 0, depending if you need less/greater than. (or -1, 0, 1 for <, =, >) – Cruncher Oct 16 '13 at 13:55
  • Why not encode the < *correctly* ? – Brian Agnew Oct 16 '13 at 13:58
  • @BrianAgnew It is encoded correctly to begin with, with the rest of the xml. Then he decodes it when he decodes the rest of the xml. The problem is he needs to differentiate between different "<"'s. – Cruncher Oct 16 '13 at 13:59
  • 1
    @Cruncher um, it doesn't look like it's encoded correctly to begin with. If it were, actual XML structure would not be encoded, only the data. Or are you seeing something we don't see? – eis Oct 28 '13 at 21:32

6 Answers6

3

Do you know how to use a third-party open source library?

You should try using apache commons-lang:

StringEscapeUtils.unescapeXml(xml)

More detail is provided in the following stack overflow post:

how to unescape XML in java

Documentation:

http://commons.apache.org/proper/commons-lang/javadocs/api-release/index.html http://commons.apache.org/proper/commons-lang/userguide.html#lang3.

Community
  • 1
  • 1
Rick Suggs
  • 1,582
  • 1
  • 15
  • 30
3

You're using SOAP wrong.

In particular, you do not need the following line of code:

     String response = decodeHtmlEntityCharacters(sb.toString());    

Just return sb.toString(). And for $DEITY's sake, do not use string methods to parse the retrieved string, use an XML parser, or a full-blown SOAP stack...

Tassos Bassoukos
  • 16,017
  • 2
  • 36
  • 40
1

Does the > or < character always appear at the beginning of a value? Then you could use regex to handle the cases in which the &gt; or &lt; are followed by a digit (or dot, for that matter).

Sample code, assuming the replacement strings used in it don't appear anywhere else in the XML:

private String decodeHtmlEntityCharacters(String inputString) throws Exception {
    Enumeration en = htmlEntitiesTable.keys();

    // Replaces &gt; or &lt; followed by dot or digit (while keeping the dot/digit)
    inputString = inputString.replaceAll("&gt;(\\.?\\d)", "Valuegreaterthan$1");
    inputString = inputString.replaceAll("&lt;(\\.?\\d)", "Valuelesserthan$1");

    while(en.hasMoreElements()){
        String key = (String)en.nextElement();
        String val = (String)htmlEntitiesTable.get(key);

        inputString = inputString.replaceAll(key, val);
    }

    inputString = inputString.replaceAll("Valuelesserthan", "&lt;");
    inputString = inputString.replaceAll("Valuegreaterthan", "&gt;");

    return inputString;
}

Note the most appropriate answer (and easier for everyone) would be to correctly encode the XML at the sender side (it would also render my solution non-working BTW).

Piovezan
  • 3,215
  • 1
  • 28
  • 45
  • ">" and "<" can always be found at the beginning of a value. Then the data that follows is usually a numeric. e.g. `<0.5` About the regex, is there a tool for building it? Not familiar with regex. – yonan2236 Oct 29 '13 at 11:07
  • The first argument of `String.replaceAll()` is a regex. Since we're at it, let me ask: is `` expected to appear (i.e. empty values)? – Piovezan Oct 29 '13 at 11:19
  • No, if an element has no content, the tag would be just `` – yonan2236 Oct 29 '13 at 12:18
  • Ok. Just to be sure, which ones you need to keep encoded as `>` and `<`, the enclosing characters belonging to XML elements or the ones that belong to the values? – Piovezan Oct 29 '13 at 12:43
  • Yes, but you are referring to the <'s belonging to the values (e.g. <50), right? In opposition to the XML element enclosing characters, which should not be encoded, right? – Piovezan Oct 29 '13 at 13:13
  • the "<" belonging to the value. – yonan2236 Oct 29 '13 at 13:16
0

It would be hard to cope with all the situations but you could cover the most common ones by adding a few more rules by assuming that any less than followed by a space is data, and a greater than that has a space in front of it is data and need to be encoded again.

private final static Hashtable htmlEntitiesTable = new Hashtable();
static {
    htmlEntitiesTable.put("&amp;","&");
    htmlEntitiesTable.put("&quot;","\"");
    htmlEntitiesTable.put("&lt;","<");
    htmlEntitiesTable.put("&gt;",">");  
}

private String decodeHtmlEntityCharacters(String inputString) throws Exception {
    Enumeration en = htmlEntitiesTable.keys();

    while(en.hasMoreElements()){
        String key = (String)en.nextElement();
        String val = (String)htmlEntitiesTable.get(key);

        inputString = inputString.replaceAll(key, val);
    }

    inputString = inputString.replaceAll("< ","&lt; ");       
    inputString = inputString.replaceAll(" >"," &gt;");       

    return inputString;
}
Dijkgraaf
  • 11,049
  • 17
  • 42
  • 54
0

'>' is not escaped in XML. So you shouldn't have an issue with that. Regarding '<', here are the options I can think of.

  1. Use CDATA in web response for text containing special characters.
  2. Rewrite the text by reversing the order. For eg. if it is x < 2, change it to 2 > x. '>' is not escaped unless its a part of CDATA.
  3. Use another attribute or element in the XML response to indicate '<' or '>'.
  4. Use regular expression to find a sequence that starts with '<' and followed by a string, and followed by '<' of the closing tag. And replace it with some code or some value that you can interpret and replace later.

Also, you don't need to do this:

String response = decodeHtmlEntityCharacters(sb.toString()); 

You should be able to parse the XML after you take care of the '<' sign in text.

You can use this site for testing regular expressions.

Poornima
  • 918
  • 5
  • 11
0

Why not serialize your xml?, its much easier than what you are doing.

for an example:

var ser = new XmlSerializer(typeof(MyXMLObject));
using (var reader = XmlReader.Create("http.....xml"))
{
     MyXMLObject _myobj = (response)ser.Deserialize(reader);
}
Nick Kahn
  • 19,652
  • 91
  • 275
  • 406