1

I have following (very large => 5GB) XML:

<Hotels>
  <Hotel>
    <Name>Hotel 1</Name>
    <City>City 1</City>
    <Phone>12345</Phone>
  </Hotel>
  <Hotel>
    <Name>Hotel 2</Name>
    <City>City 2</City>
    <Phone>67890</Phone>
  </Hotel>
  ...
</Hotels>

And I have a file which defines which fields I want to extract and what their path is:

$root = "/Hotels/Hotel";
$fields = array("HotelName"   => "/Name",
                "PhoneNumber" => "/Phone");

So the path for HotelName would be: /Hotels/Hotel/Name.

Now I want to get the information for every hotel. I cannot create classes for them (like here) because the script has to be dynamically and different XML-files with different definition-files will be passed.

How can I solve this by using the paths, without classes and with low memory usage (=> large files)?

//Edit: Everything is implemented. I just need a way to iterate through the Hotel and get their values with the paths I have.

Community
  • 1
  • 1
halloei
  • 1,892
  • 1
  • 25
  • 45
  • 1
    For "very large" (how large is that?) XML files you might want to consider either to dump it to a relational database (it seems like this XML file actually represents a table) or the use of a native XML database like [Basex](http://www.basex.org). – Jens Erat Feb 21 '14 at 11:10
  • how large are those files? – Liviu Stirb Feb 21 '14 at 11:10
  • The files can be 5 GB large. I export them to csv, so I can import them to MySQL with "LOAD DATA INFILE". – halloei Feb 21 '14 at 11:38
  • probably is better to write a sax parser then – Liviu Stirb Feb 21 '14 at 12:55

2 Answers2

0

Try reading this tutorial there are some explanations and examples. http://viralpatel.net/blogs/java-xml-xpath-tutorial-parse-xml/

For your porpuse you shoudl use something from Stax familiy, not DOM.

try do this

public class QueryXML {
  public void query() throws ParserConfigurationException, SAXException,
      IOException, XPathExpressionException {
    // standard for reading an XML file
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    DocumentBuilder builder;
    Document doc = null;
    XPathExpression expr = null;
    builder = factory.newDocumentBuilder();
    doc = builder.parse("person.xml");

    // create an XPathFactory
    XPathFactory xFactory = XPathFactory.newInstance();

    // create an XPath object
    XPath xpath = xFactory.newXPath();

    // compile the XPath expression
    expr = xpath.compile("//person[firstname='Lars']/lastname/text()");
    // run the query and get a nodeset
    Object result = expr.evaluate(doc, XPathConstants.NODESET);

    // cast the result to a DOM NodeList
    NodeList nodes = (NodeList) result;
    for (int i=0; i<nodes.getLength();i++){
      System.out.println(nodes.item(i).getNodeValue());
    }

    // new XPath expression to get the number of people with name Lars
    expr = xpath.compile("count(//person[firstname='Lars'])");
    // run the query and get the number of nodes
    Double number = (Double) expr.evaluate(doc, XPathConstants.NUMBER);
    System.out.println("Number of objects " +number);

    // do we have more than 2 people with name Lars?
    expr = xpath.compile("count(//person[firstname='Lars']) >2");
    // run the query and get the number of nodes
    Boolean check = (Boolean) expr.evaluate(doc, XPathConstants.BOOLEAN);
    System.out.println(check);
  }

You can simply adapt that code to your needs.

RMachnik
  • 3,598
  • 1
  • 34
  • 51
0

If you already found the <Hotel/> node and got it as a DOM-reference, just access its children (with the hotel as context). Either using

  • XPath: ./Name or shorter Name (just don't start it with /, which refers to the root), but make sure to use the hotel node as query context; or
  • DOM: hotel.getChildNodes(), and then loop over the result set comparing element names to find the respective child node.
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Thanks, but I cannot build an document of the whole file first. It's too big to load it into the memory. – halloei Oct 29 '14 at 12:53
  • If you're dealing with large XML documents, consider using an XML database especially written for doing that. [BaseX](http://basex.org) and [eXist DB](http://existdb.org) are some open source examples that can also be interfaced from Java. Somewhat steep learning curve (it helps if you already know XPath), but worth the effort. Otherwise, you'll be stuck with scanning the document with a SAX-like approach. – Jens Erat Oct 29 '14 at 13:12