3

Normally one should get data from a database using a stream and then send it to Apache POI using a stream, etc. I mean, data should be in stream format from start to end. However my database codes are not written as stream and they have complicated sources from various different types from databases. Thus I need to query my data source like this getData(int page, int perPage). And then I want to forward the results to the stream. Like this:

for(int i = 0; i < 5000; i++) {
    stream.add(getData(i, 10000));
}

So my question is how can I push data into the stream on the fly without using too much RAM?

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
ilhan
  • 8,700
  • 35
  • 117
  • 201
  • What is the problem with your code, what you want to do exactly? – Youcef LAIDANI Dec 21 '19 at 09:46
  • 2
    *"data should be in stream format"* What do you mean by that? The word "stream" is used by multiple contexts in the Java API, e.g. `InputStream` or `LongStream`, and those are entirely unrelated, although both are *pull*-type streams, while e.g. `OutputStream` is a *push*-type stream. This question reads to me like some other kind of stream, since you call `stream.add()`, so what kind of stream are you referring to, specifically? – Andreas Dec 21 '19 at 12:00

2 Answers2

2

You can do IntStream.range(0, 5000).mapToObj(i -> getData(i, 10000)).

See also How to implement a Java stream?

bastet
  • 93
  • 4
  • 2
    This works, also I suggest an improvement: wrap your `getData()` call to return `Optional`, so you can stop earlier if there is no more data like: `IntStream.range(0, 5000).mapToObj(i -> getData(i, 10000)).takeWhile(Optional::isPresent)` – Yurii J Dec 21 '19 at 09:42
  • I think you'd have to be quite careful about pulling on this stream. You don't want to fetch page 2 before having fully consumed page 1, because one of the points of paging is to use a bounded amount of memory regardless of the size of the results list. If you apply Yurii's comment and then `flatMap` the result I think that should work but would need to be checked – Dici Dec 21 '19 at 10:05
  • @Dici Note that the stream is sequential. – bastet Dec 21 '19 at 12:11
  • Well it depends how you pull on it. If you `flatMap` right after your snippet I think this should do it, but if you call `next()` on your stream (after getting its iterator) repeatedly it will fetch page after page without necessarily processing the pages you fetched. But I guess you'd have to do it on purpose, it's like if you were collecting the stream, it's wrong but it's the user's fault, not your snippet – Dici Dec 24 '19 at 14:50
2

You can just implement an iterator and wrap it in a stream right?

Stream<T> stream = stream(new Iterator<T>() {
    private Iterator<T> currentBatch;
    private int page;

    @Override
    public boolean hasNext() {
        if (currentBatch != null && currentBatch.hasNext()) return true;
        currentBatch = getData(page, BATCH_SIZE).iterator();
        return currentBatch.hasNext();
    }

    @Override
    public T next() {
        if (!hasNext()) throw new NoSuchElementException();
        return currentBatch.next();
    }
});

private static <T> Stream<T> stream(Iterator<T> iterator) {
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(iterator, UNKNOWN_CHARACTERISTICS), false);
}
Dici
  • 25,226
  • 7
  • 41
  • 82