I need to create a service that parses information from paged websites and returns an iterator of parsed info.
To do so, I often use streams to chain parsing together. However, I've noticed that if one calls iterator() on a java stream that features a flatmap call, each stream that is flat mapped is read fully before the first iteration is returned. If one of the streams takes a long time to complete, or is infinite, the final iterator will never return an iteration.
Is this by design? Should I be doing something differently? Have a look at the below sample code. Note how the output changes when using foreach() vs iterator().
package temp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.concurrent.ThreadLocalRandom;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;
import com.google.common.collect.AbstractIterator;
public class StreamTest {
public static void main(String[] args) {
// set first iterable max index randomly
int MAX_INDEX = ThreadLocalRandom.current().nextInt(10, 20 + 1);
System.out.println("max index: " + MAX_INDEX);
// create slow iterable
Iterable<String> iterable1 = () -> new AbstractIterator<String>() {
private int index = -1;
@Override
protected String computeNext() {
index++;
if (index >= MAX_INDEX) {
return this.endOfData();
}
System.out.println("dummy computing index: " + index);
try {
Thread.sleep(500);
} catch (InterruptedException e) {
throw new java.lang.RuntimeException(e);
}
return "iterable " + index;
}
};
// create list
Iterable<String> iterable2 = Arrays.asList("list index 1", "list index 2", "list index 3");
// create a stream supplier
Supplier<Stream<String>> streamSupplier = () -> Arrays.asList(iterable1, iterable2).stream()
.flatMap(i -> StreamSupport.stream(i.spliterator(), false));
// print using for each
System.out.println("\n***testing for each***");
streamSupplier.get().forEach(str -> {
System.out.println("for each - " + str);
});
System.out.println("\n***testing iterator***");
Iterator<String> iter = streamSupplier.get().iterator();
while (iter.hasNext()) {
System.out.println("iterator - " + iter.next());
}
}
}
Here is the output from the above:
max index: 12
***testing for each***
dummy computing index: 0
for each - iterable 0
dummy computing index: 1
for each - iterable 1
dummy computing index: 2
for each - iterable 2
dummy computing index: 3
for each - iterable 3
dummy computing index: 4
for each - iterable 4
dummy computing index: 5
for each - iterable 5
dummy computing index: 6
for each - iterable 6
dummy computing index: 7
for each - iterable 7
dummy computing index: 8
for each - iterable 8
dummy computing index: 9
for each - iterable 9
dummy computing index: 10
for each - iterable 10
dummy computing index: 11
for each - iterable 11
for each - list index 1
for each - list index 2
for each - list index 3
***testing iterator***
dummy computing index: 0
dummy computing index: 1
dummy computing index: 2
dummy computing index: 3
dummy computing index: 4
dummy computing index: 5
dummy computing index: 6
dummy computing index: 7
dummy computing index: 8
dummy computing index: 9
dummy computing index: 10
dummy computing index: 11
iterator - iterable 0
iterator - iterable 1
iterator - iterable 2
iterator - iterable 3
iterator - iterable 4
iterator - iterable 5
iterator - iterable 6
iterator - iterable 7
iterator - iterable 8
iterator - iterable 9
iterator - iterable 10
iterator - iterable 11
iterator - list index 1
iterator - list index 2
iterator - list index 3
Shouldn't iterator() and foreach() have the same output?