Convert InputStream into Stream of strings of fixed length

Question

Like in Convert InputStream into Stream<String> given a Charset I want to convert an InputStream is into a Stream<String> stream. But this time instead of splitting the InputStream at the new line characters, I want to split it into parts of equal length. So all strings of the stream would have the same length (with a possible exception on the last element of the stream, that may be shorter).

Read the source code of BufferedReader.lines(), and use it as a model for your own method. Instead of reading the next line, you'll have to read the next N characters. — JB Nizet, May 19 '15 at 22:06

score 3 · Answer 1 · answered May 19 '15 at 22:11

3

I don't think this is possible using class library methods only, so you'll have to write your own logic that follows the same pattern as BufferedReader.lines:

InputStreamReader - Start by creating an InputStreamReader
Iterator<String> - Implement a custom Iterator subclass that splits the stream into pieces however you want. It sounds like you want to implement hasNext() and next() to call a readPart() that reads at most N characters.
Spliterators.spliteratorUnknownSize - Pass the iterator to this method to create a Spliterator.
StreamSupport.stream - Pass the Spliterator to this method to create a stream.

Ultimately, the class library just doesn't have builtins for reading from an input stream and converting into fixed-size strings, so you have to write those for #1/#2. After that, converting to a stream in #3/#4 isn't too bad since there are class library methods to help.

answered May 19 '15 at 22:11

Brett Kail

33,593
2
85
90

It seems to be a common mistake to go straight to implementing an `Iterator` first, most probably, because everyone is familiar with that type. But splitting a logic like `readPart()` into two methods of the `hasNext()`/`next()` kind is not straight-forward. Instead, `Spliterator`’s `tryAdvance` method fits much better and is actually easier to implement. And if you implement a `Spliterator` directly, you don’t need to wrap an `Iterator` into a `Spliterator`… – Holger May 20 '15 at 16:14
I guess it's common enough that the Java class library folks did it for BufferedReader.lines() :-), but your suggestion is a good one. – Brett Kail May 20 '15 at 17:21
I don't think that particular bug is relevant (I/O-based streams generally can't be parallel friendly since you don't know how much data is present), but perhaps there is another bug somewhere that recommends replacing uses of Spliterators.spliteratorUnknownSize in the class libraries with AbstractSpliterator subclasses (to save the allocation of one object, I guess). – Brett Kail May 20 '15 at 22:28
Of course, fixing the issue with the streams having more parallel execution potential first, is a strong motivation. But even I/O based stream could be more parallel friendly if they adapt the splitting strategy to harmonize with the underlying buffering strategy. Interestingly, the questioner has [another question](http://stackoverflow.com/q/30196225/2711488) were I discussed why item (i.e. line) based buffering isn’t the right strategy for parallel processing of such a stream. – Holger May 21 '15 at 07:45

score 3 · Answer 2 · answered May 20 '15 at 15:57

There is no direct support for this. You can create a straight-forward factory method:

static Stream<String> strings(InputStream is, Charset cs, int size) {
    Reader r=new InputStreamReader(is, cs);
    CharBuffer cb=CharBuffer.allocate(size);
    return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(
        Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
            public boolean tryAdvance(Consumer<? super String> action) {
                try { while(cb.hasRemaining() && r.read(cb)>0); }
                catch(IOException ex) { throw new UncheckedIOException(ex); }
                cb.flip();
                if(!cb.hasRemaining()) return false;
                action.accept(cb.toString());
                cb.clear();
                return true;
            }
        }, false).onClose(()->{
            try { r.close(); }catch(IOException ex) { throw new UncheckedIOException(ex); }
        });
}

It can be used like:

try(Stream<String> chunks=strings(is, StandardCharsets.UTF_8, 100)) {
    // perform operation with chunks
}

Convert InputStream into Stream of strings of fixed length

2 Answers2

Linked