Like in Convert InputStream into Stream<String> given a Charset I want to convert an InputStream is
into a Stream<String> stream
. But this time instead of splitting the InputStream
at the new line characters, I want to split it into parts of equal length. So all strings of the stream would have the same length (with a possible exception on the last element of the stream, that may be shorter).
Asked
Active
Viewed 2,116 times
3

Community
- 1
- 1

principal-ideal-domain
- 3,998
- 8
- 36
- 73
-
Same byte length or character length? – biziclop May 19 '15 at 22:04
-
2Read the source code of BufferedReader.lines(), and use it as a model for your own method. Instead of reading the next line, you'll have to read the next N characters. – JB Nizet May 19 '15 at 22:06
2 Answers
3
I don't think this is possible using class library methods only, so you'll have to write your own logic that follows the same pattern as BufferedReader.lines
:
InputStreamReader
- Start by creating an InputStreamReaderIterator<String>
- Implement a custom Iterator subclass that splits the stream into pieces however you want. It sounds like you want to implementhasNext()
andnext()
to call areadPart()
that reads at most N characters.Spliterators.spliteratorUnknownSize
- Pass the iterator to this method to create a Spliterator.StreamSupport.stream
- Pass the Spliterator to this method to create a stream.
Ultimately, the class library just doesn't have builtins for reading from an input stream and converting into fixed-size strings, so you have to write those for #1/#2. After that, converting to a stream in #3/#4 isn't too bad since there are class library methods to help.

Brett Kail
- 33,593
- 2
- 85
- 90
-
It seems to be a common mistake to go straight to implementing an `Iterator` first, most probably, because everyone is familiar with that type. But splitting a logic like `readPart()` into two methods of the `hasNext()`/`next()` kind is not straight-forward. Instead, `Spliterator`’s `tryAdvance` method fits much better and is actually easier to implement. And if you implement a `Spliterator` directly, you don’t need to wrap an `Iterator` into a `Spliterator`… – Holger May 20 '15 at 16:14
-
I guess it's common enough that the Java class library folks did it for BufferedReader.lines() :-), but your suggestion is a good one. – Brett Kail May 20 '15 at 17:21
-
I don't think that particular bug is relevant (I/O-based streams generally can't be parallel friendly since you don't know how much data is present), but perhaps there is another bug somewhere that recommends replacing uses of Spliterators.spliteratorUnknownSize in the class libraries with AbstractSpliterator subclasses (to save the allocation of one object, I guess). – Brett Kail May 20 '15 at 22:28
-
Of course, fixing the issue with the streams having more parallel execution potential first, is a strong motivation. But even I/O based stream could be more parallel friendly if they adapt the splitting strategy to harmonize with the underlying buffering strategy. Interestingly, the questioner has [another question](http://stackoverflow.com/q/30196225/2711488) were I discussed why item (i.e. line) based buffering isn’t the right strategy for parallel processing of such a stream. – Holger May 21 '15 at 07:45
3
There is no direct support for this. You can create a straight-forward factory method:
static Stream<String> strings(InputStream is, Charset cs, int size) {
Reader r=new InputStreamReader(is, cs);
CharBuffer cb=CharBuffer.allocate(size);
return StreamSupport.stream(new Spliterators.AbstractSpliterator<String>(
Long.MAX_VALUE, Spliterator.ORDERED|Spliterator.NONNULL) {
public boolean tryAdvance(Consumer<? super String> action) {
try { while(cb.hasRemaining() && r.read(cb)>0); }
catch(IOException ex) { throw new UncheckedIOException(ex); }
cb.flip();
if(!cb.hasRemaining()) return false;
action.accept(cb.toString());
cb.clear();
return true;
}
}, false).onClose(()->{
try { r.close(); }catch(IOException ex) { throw new UncheckedIOException(ex); }
});
}
It can be used like:
try(Stream<String> chunks=strings(is, StandardCharsets.UTF_8, 100)) {
// perform operation with chunks
}

Holger
- 285,553
- 42
- 434
- 765