4

Hy, I have a (very)large XML file (100GB) with a list of foo, I want to convert-it into a stream like they where introduce into java 8 of object:

Any idea of lib or code sample?

at the beginning:

<foos> 
  <foo>...</foo>
  <foo>...</foo>
</foos>

at the end:

Stream<Foo>  foosStream = ????("foo.xml")
streamFoos.forEach(foo->foo.doFooStuffs());

Edit: @Pierre Thank-you, here is the implementation of your solution:

  try {
            XMLEventReader reader = XMLInputFactory.newInstance().
                    createXMLEventReader(stream);
            final Unmarshaller unmarshaller = JAXBContext.newInstance(XXXXX.class).createUnmarshaller();

            Iterator<XXXXX> it = new XmlIterator<>(reader, unmarshaller, "xxxxxx");
            return StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, Spliterator.ORDERED), false);
        } catch (XMLStreamException e1) {
            logger.error("XMLStreamException", e1);
        } catch (JAXBException e) {
            logger.error("JAXBException", e);
        }

and

public class XmlIterator<T> implements Iterator<T> {

    private final Logger logger = LoggerFactory.getLogger(this.getClass());

    XMLEventReader reader;

    XMLEvent event;

    Unmarshaller unmarshaller;
    String name;

    public XmlIterator(XMLEventReader reader, Unmarshaller unmarshaller, String name) {
        this.reader = reader;
        this.unmarshaller = unmarshaller;
        this.name = name;
        try {
            reader.next();
            this.event = reader.peek();
        } catch (XMLStreamException e) {
            logger.error("", e);
            event = null;
        }
    }

    @Override
    public boolean hasNext() {
        try {
            while (event != null && !(event.isStartElement() && name.equals(event.asStartElement().getName().getLocalPart()))) {
                Object a = reader.next();
                event = reader.peek();
            }
            return event != null;

        } catch (XMLStreamException e) {
            logger.error("", e);
            event = null;
        }
        return event != null;
    }

    @Override
    public T next() {
        try {
            T next = ((JAXBElement<T>) unmarshaller.unmarshal(reader)).getValue();
            event = reader.peek();
            return next;
        } catch (JAXBException e) {
            logger.error("error during unmarshalling ", e);
            return null;
        } catch (XMLStreamException e) {
            logger.error("error during stream ", e);
            return null;
        }
    }
}
sab
  • 4,352
  • 7
  • 36
  • 60
  • 1
    Do you know the different techniques and APIs to read an XML file in Java? Which one would you think is suited here? How large the XML files are going to be? Probably, you want to take a look at [StAX](https://docs.oracle.com/javase/tutorial/jaxp/stax/api.html) but could you add more context into your question? – Tunaki Sep 20 '16 at 10:37
  • 1
    Yes, I know a lot of lib, but they are all very low-level. I don't understand why in 2016 I still have to analyse manualy the start_element to generate myself the stream, when I could juste have to specify the Xpath. – sab Sep 20 '16 at 10:47
  • " I still have to analyse manualy the start_element to generate myself the stream" : have a look at jaxb ( = define a xml schema for your data ) – Pierre Sep 20 '16 at 10:55

1 Answers1

7
Community
  • 1
  • 1
Pierre
  • 34,472
  • 31
  • 113
  • 192
  • Probably easier to just not use a stream and call the consumer method yourself for every `Foo` encountered call... – fabian Sep 20 '16 at 10:39
  • all the lib than i find seem dated of before the stream, there is no more recent lib? – sab Sep 20 '16 at 10:42
  • Does this require the whole XML to be in memory before parsing starts? – Bassam May 16 '17 at 16:26
  • After reading on Stax, I can see this is do-able. How did I not know about this? This is great – Bassam May 16 '17 at 16:45