13

I have the following two level XML structure. A list of boxes, each containing a list of drawers.

<Boxes>
    <Box id="0">
        <Drawers>
            <Drawer id="0"/>
            <Drawer id="1"/>
            ...
        </Drawers>
    </Box>
    <Box id="1">
...
    </Box>
</Boxes>

I'm parsing it using StAX and exposed the structure through two Iterators:

  1. BoxIterator implements Iterator<Box>, Iterable<Box>
  2. Box implements Iterable<Drawer>
  3. DrawerIterator implements Iterator<Drawer>

I can then do the following:

BoxIterator boxList;
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId()
  }
}

Under the hood of those Iterators I'm using StAX and both of them are accessing the same underlying XMLStreamReader. If I call BoxIterator.next() it will influence the result that will be returned on subsequent calls to DrawerIterator.next() because the cursor will have moved to the next box.

Does this break the contract of Iterator? Is there a better way to iterate over a two level structure using StAX?

Roland
  • 7,525
  • 13
  • 61
  • 124
  • 1
    Your description looks like `Box.iterator` returns a new `DrawerIterator` and if that is so the contract won't be broken, since the `DrawerIterator` should return only elements inside the current box anyways. – Thomas Jul 28 '16 at 08:31
  • @Thomas `Box.iterator()` will return the same `DrawerIterator` on every call, since they will all be accessing the same underlying stream anyway. This implies that even a `DrawerIterator` returned by a past call to `Box.iterator()` will be magically advanced. All will be accessing the underlying stream at the same cursor position, always. – Roland Jul 28 '16 at 08:36
  • Ah I see. That would break the contract then. Do you need to return the same instance on every call? If you'd return a new instance each time and iterate sequentially (i.e. no random access) it wouldn't matter whether the cursor position would have been advanced. After you've iterated over a box' drawers any further call to that box' DrawerIterator's `hasNext()` should return false. – Thomas Jul 28 '16 at 08:45
  • @Thomas I guess that would be possible, though it would require some bookkeeping. In practice I only want to use it in a sane way so I don't know if it would be worth doing the work to prevent pathological cases. On the other hand I guess there should be a better pattern. I was thinking about just doing a `DrawerIterator` that would have a `currentBox` field that would automatically be updated. Yet this would still have the problem that all the iterators would be accessing the same underlying stream, so advancing one, would advance all the others. – Roland Jul 28 '16 at 09:19
  • @Roland: what are you trying to achieve? Are you only interested in `Drawer` ids or `Drawers`? Are you not interested in the `Box` ids? Perhaps you can use StAX in another way. StAX can parse the file and generate events whenever it finds the start or the end of an element. All you have to do is check whether that element is a `Box` or a `Drawer`. – gogognome Aug 02 '16 at 12:11
  • @gogognome From the `Box` all I need is the ID. From the `Drawer` I need some more information which is not present in my simplified example. I'm using `StAX` as you suggested. When I find a `Box` I just read the `ID` and save it. Then when I go through the `Drawers` I know which `Box` they belong to. – Roland Aug 02 '16 at 12:56
  • What about creating two distinct XMLStreamReader pointing to the same XML code? – Stephan Aug 02 '16 at 13:59

4 Answers4

5

Does this break the contract of Iterator?

No.

The Java Iterator imposes two "contracts". The first contract is the Java interface itself, which declares 3 methods: hasNext(), next(), and remove(). Any class which implements this Iterator interface must define those methods.

The second contract defines the behaviour of the Iterator:

hasNext() [...] returns true if the iteration has more elements. [...] next() returns the next element in the iteration [and] throws NoSuchElementException if the iteration has no more elements.

That is the entire contract.

It is true that if the underlying XMLStreamReader is advanced, it can mess up your BoxIterator and/or DrawerIterator. Alternately, calling BoxIterator.next() and/or DrawerIterator.next() at the wrong points could mess up the iteration. However, used correctly, such as in your example code above, it works properly and greatly simplifies the code. You just need to document the proper usage of the iterators.

As a concrete example, the Scanner class implements Iterator<String>, and yet has many, many other methods that advance the underlying stream. If there existed a stronger contract imposed by the Iterator class, then the Scanner class itself would be violating it.


As Ivan points out in the comments, boxList should not be of type class BoxIterator implements Iterator<Box>, Iterable<Box>. You really should have:

class BoxList implements Iterable<Box> { ... }
class BoxIterator implements Iterator<Box> { ... }

BoxList boxList = ...;
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId()
  }
}

While having one class implement both Iterable and Iterator is not technically wrong for your use case, it can cause confusion.

Consider this code in another context:

List<Box> boxList = Arrays.asList(box1, box2, box3, box4);
for(Box box : boxList) {
    // Do something
}
for(Box box : boxList) {
    // Do some more stuff
}

Here, boxList.iterator() is called twice, to create two separate Iterator<Box> instances, for iterating the list of boxes twice. Because the boxList can be iterated over multiple times, each iteration requires a new iterator instance.

In your code:

BoxIterator boxList = new BoxIterator(xml_stream);
for (Box box : boxList) {
  for (Drawer drawer : box) {
    drawer.getId();
  }
}

because you are iterating over a stream, you can't (without rewinding the stream, or storing the extracted objects) iterate over the same nodes a second time. A second class/object is not needed; the same object can act as both Iterable and Iterator ... which saves you one class/object.

Having said that, premature optimization is the root of all evil. The savings of one class/object is not worth the possible confusion; you should split BoxIterator into a BoxList implements Iterable<Box>, and BoxIterator implements Iterator<Box>.

Community
  • 1
  • 1
AJNeufeld
  • 8,526
  • 1
  • 25
  • 44
  • 1
    Actually, the code sample is not so good, because BoxIterator class is both Iterable and Iterator. Things may become messy on second use of the same instance, if the state of iterator is not reset. – Ivan Gammel Aug 05 '16 at 10:51
  • @IvanGammel you have a point. BoxIterator just returns `this` in the call to `iterator()` and the cursor position on the underlying XMLStreamReader is not reset. So maybe I should just not use the whole Iterator, Iterable paradigma. I did it only to be able to use the enhanced for loop, i.e. syntactic sugar. – Roland Aug 05 '16 at 14:33
  • 2
    @Roland, while it's not the usual way to parse XML, your use case is valid if the input is huge and you have small heap limit (otherwise you could just parse whole file to object model with XMLBeans or XStream), so you can use this approach indeed (looks like an Active Record pattern to me). You just need to implement it carefully. – Ivan Gammel Aug 05 '16 at 15:33
  • @Roland The code sample `for (Box box : boxList) { for (Drawer drawer : box) { drawer.getId(); } }` is fine. The only "not good" part is `boxList` being of type `BoxIterator`. It really should be of type `BoxList implements Iterable`. I'll add that to the answer. – AJNeufeld Aug 05 '16 at 15:52
3

It has the potential to break the contract for the reason that hasNext() could return true, but next() could throw a NoSuchElementException.

The contract of hasNext() is:

Returns true if the iteration has more elements. (In other words, returns true if next() would return an element rather than throwing an exception.)

But it could happen that between calling hasNext() and next(), another iterator could have moved the stream position such that there are no more elements.

However, in the way you've used it (the nested loop), you won't encounter the breakage.

If you were to pass an iterator to another process, then you may encounter this breakage.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • The problem you pointed out could happen with any `Iterator`, no? If after calling `hasNext() ` you pass an `Iterator` to another process which consumes it, then `next()` will not give you back what you expected. – Roland Aug 03 '16 at 14:13
  • 1
    @Roland I meant that giving *another* iterator to another process could affect an iterator. Calling `next()` affects *all* iterators because they share the same underlying input. – Bohemian Aug 03 '16 at 14:19
  • 1
    (Almost) every iterator shares underlying state with _something_. And even when `hasNext()` returns `true`, it does not guarantee that `next()`, if called _immediately afterwards_, will _always_ succeed; it might throw a `ConcurrentModificationException`. Iterators are just helpers; they are often syntactically convenient, but they never guarantee a "can't be broken or corrupted iteration" over some structure. – AJNeufeld Aug 03 '16 at 14:58
  • @ajn Throwing `ConcurrentModificationException` when the underlying Collection has been externally modified is *part of the contract*. But here we have the problem that simply calling `next()` has an unexpected ( to the user) side effect of *modifying* the backing object (it irrevocably advances the stream pointer), and we have multiple iterators over the same backing object, so `hasNext()` may not be accurate by the time that `next()` is called. Those things are not the case for most iterators. – Bohemian Aug 03 '16 at 15:05
  • 2
    Nope! Just read [`Java SE 8 Iterator`](https://docs.oracle.com/javase/8/docs/api/java/util/Iterator.html). Absolutely no mentioning of `ConcurrentModificationException` at all. Definitely **not** part of the contract of `Iterator`. – AJNeufeld Aug 03 '16 at 15:12
  • @AJN oh, you're right. Well, the other stuff still stands: Essentially, here `.next()` is a getter with side effects. – Bohemian Aug 03 '16 at 19:28
2

The only design issue with your piece of code is that BoxIterator implements both Iterator and Iterable. Normally, Iterable object returns new stateful Iterator every time iterator() method is called. Because of that, there should be no interference between two iterators, but you'll need a state object to correctly implement exit from inner loop (probably, you already have that, but I must mention it for clarity).

  1. State object will act like proxy for parser with two methods popEvent and peekEvent. On peek iterators will check the last event, but will not consume it. on pop they'll consume last event.
  2. BoxIterable#iterator() will consume StartElement(Boxes) and return iterator after that.
  3. BoxIterator#hasNext() will peek events and pop them until StartElement or EndElement will be received. It will then return true only if StartElement(Box) was received.
  4. BoxIterator#next() will peek-and-pop Attribute events until StartElement or EndElement received to initialize Box object.
  5. Box#iterator() will consume StartElement(Drawers) event and then return DrawerIterator.
  6. DrawerIterator#hasNext() will peek-and-pop until StartElement or EndElement received. It will then return true only if it was StartElement(Drawer)
  7. DrawerIterator#next() will consume Attribute events until EndElement(Drawer) received.

Your user code will remain almost unmodified:

BoxIterable boxList;
/*
 * boxList must be an BoxIterable, which on call to iterator() returns 
 * new BoxIterator initialized with current state of STaX parser
 */
for (Box box : boxList) { 
  /* 
   * on following line new iterator is created and initialized 
   * with current state of parser 
   */
  for (Drawer drawer : box) { 
    drawer.getId()
  }
}
AJNeufeld
  • 8,526
  • 1
  • 25
  • 44
Ivan Gammel
  • 652
  • 7
  • 17
  • _Normally, Iterable object returns new stateful Iterator every time iterator() method is being called._ That is not the case here. `BoxIterator` has one underlying `XMLStreamReader` so I only return `this` in the `iterator()` method. The state of this Iterator will be the position where the cursor happens to be on the underlying stream. – Roland Aug 05 '16 at 14:25
0

It doesn't look like it would break the contract provided you are carefully implementing/overriding next() & hasNext() methods in the BoxIterator & DrawerIterator by implementing Iterator interface. Needless to say, the obvious condition to take care is that hasNext() should return true if next() is returning an element and false if next() is giving the exception.

But what I could not understand is why have you made BoxIterator implement Iterable<Box>

BoxIterator implements Iterator<Box>, Iterable<Box> Since overriding iterator() method from Iterable interface for Box is always going to return an instance of BoxIterator. If you don't have any other objective behind it, then there is no purpose of encapsulating this feature in BoxIterator.

AJNeufeld
  • 8,526
  • 1
  • 25
  • 44
Akash Mishra
  • 682
  • 1
  • 5
  • 13