1

I have a java string that contains the following XML code:

<?xml version="1.0" encoding="utf-8"?>
    <Chart>
        <request>
            <zip>12345</zip>
            <city>Miami</city>
        </request>
    </Chart>

What is the easiest way to parse this string to extract the value of

<zip> (in this case 12345)
Alex
  • 33
  • 1
  • 7
  • Read : [How to read XML using XPath in Java](http://stackoverflow.com/questions/2811001/how-to-read-xml-using-xpath-in-java). The XPath expression for this case can be as simple as: '//zip' – har07 Dec 09 '15 at 05:40

2 Answers2

1

You have XML, better is parse it as XML, and XPATH direct

import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;

import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;


String xml="<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n" + 
        "    <Chart>\r\n" + 
        "        <request>\r\n" + 
        "            <zip>12345</zip>\r\n" + 
        "            <city>Miami</city>\r\n" + 
        "        </request>\r\n" + 
        "    </Chart>";

DocumentBuilderFactory builderFactory =DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();

// PARSE XML
Document document = builder.parse(new InputSource(new  StringReader(xml)));

// XPATH
XPath xPath = XPathFactory.newInstance().newXPath();
// your path
String expression = "//Chart/request/zip";

NodeList nodes  = (NodeList)  xPath.compile(expression).evaluate(document, XPathConstants.NODESET);

for(int i=0; i<nodes.getLength(); i++)
     {
      Node the_node = nodes.item(i);

     if(the_node instanceof Element)
          {
          Element the_element=(Element) the_node;
          System.out.println("element="+the_element.getTextContent());
          break; // STOP at the first
          }
      }
  • Thank you for replies. Does XPATH require a library? How can I fix "cannot find symbol" – Alex Dec 10 '15 at 00:41
0

Without going into the deep dark world of parsing xml with Java, you could use regex:

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class FindZip {

  public static void main(String[] args) {
    Pattern pattern = 
    Pattern.compile("<zip>(\\d+)</zip>");
    String zip_code;

    Matcher matcher = pattern.matcher(
        "<?xml version=\"1.0\" encoding=\"utf-8\"?>" +
        "<Chart>" +
        "    <request>" +
        "        <zip>12345</zip>" +
        "        <city>Miami</city>" +
        "    </request>" +
        "</Chart>"
      );

    boolean found = false;
    while (matcher.find()) {
      zip_code = matcher.group(1);
      System.out.printf(
          "I found the zip code \"%s\" starting at index %d and ending at index %d.%n",
          zip_code,
          matcher.start(1),
          matcher.end(1)
        );
      found = true;
    }
    if (!found) {
      System.out.println("No match found.");
    }
  }
}

There are obvious drawbacks and limitations to this approach, but at least you get your zip code

br3nt
  • 9,017
  • 3
  • 42
  • 63
  • Using regex, how can I modify the code you provided to put value of zip into a string? – Alex Dec 10 '15 at 01:06
  • You would use `String zip_code = matcher.group(1);`. I've modified the answer. – br3nt Dec 10 '15 at 01:23
  • If you're going to extract many different fields from the XML, you probably want to go with the other answer. However, to answer you're question, the regex to capture currency can vary depending on the format. Will the currency value always be an integer? Will it contain cents? Will it contain the currency symbol? Could there be negative values? You would probably be better off capturing the value and the currency type and then using using a formatter, such as described in [this SO answer](http://stackoverflow.com/a/10826990/848668). – br3nt Dec 10 '15 at 03:36
  • Thank you for your help. If the tag contains a url, what do I change "\\d+" to? – Alex Dec 10 '15 at 03:54
  • Here is an example of a regex matching a URL: `(https?:\\/\\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\\/\\w \\.-]*)*\\/?`. It's not perfect or complete, but as you can see, its complex. Alternatively, you could use something very generic, like `.*`. Then parse the value like: `String url = new URL(matcher..find());`. Though again, it would be easier/better at this point to parse the XML properly and use XPath to get to the fields you want. – br3nt Dec 10 '15 at 04:04