So I have some code using Java 8 streams, and it works. It does exactly what I need it to do, and it's legible (a rarity for functional programming). Towards the end of a subroutine, the code runs over a List of a custom pair type:
// All names Hungarian-Notation-ized for SO reading
class AFooAndABarWalkIntoABar
{
public int foo_int;
public BarClass bar_object;
....
}
List<AFooAndABarWalkIntoABar> results = ....;
The data here must be passed into other parts of the program as arrays, so they get copied out:
// extract either a foo or a bar from each "foo-and-bar" (fab)
int[] foo_array = results.stream()
.mapToInt (fab -> fab.foo_int)
.toArray();
BarClass[] bar_array = results.stream()
.map (fab -> fab.bar_object)
.toArray(BarClass[]::new);
And done. Now each array can go do its thing.
Except... that loop over the List twice bothers me in my soul. And if we ever need to track more information, they're likely going to add a third field, and then have to make a third pass to turn the 3-tuple into three arrays, etc. So I'm fooling around with trying to do it in a single pass.
Allocating the data structures is trivial, but maintaining an index for use by the Consumer seems hideous:
int[] foo_array = new int[results.size()];
BarClass[] bar_array = new BarClass[results.size()];
// the trick is providing a stateful iterator across the array:
// - can't just use 'int', it's not effectively final
// - an actual 'final int' would be hilariously wrong
// - "all problems can be solved with a level of indirection"
class Indirection { int iterating = 0; }
final Indirection sigh = new Indirection();
// equivalent possibility is
// final int[] disgusting = new int[]{ 0 };
// and then access disgusting[0] inside the lambda
// wash your hands after typing that code
results.stream().forEach (fab -> {
foo_array[sigh.iterating] = fab.foo_int;
bar_array[sigh.iterating] = fab.bar_object;
sigh.iterating++;
});
This produces identical arrays as the existing solution using multiple stream loops. And it does so in about half the time, go figure. But the iterator indirection tricks seem so unspeakably ugly, and of course preclude any possibility of populating the arrays in parallel.
Using a pair of ArrayList
instances, created with appropriate capacity, would let the Consumer code simply call add
for each instance, and no external iterator needed. But ArrayList's toArray(T[])
has to perform a copy of the storage array again, and in the int case there's boxing/unboxing on top of that.
(edit: The answers to the "possible duplicate" question all talk about only maintaining the indices in a stream, and using direct array indexing to get to the actual data during filter
/map
calls, along with a note that it doesn't really work if the data isn't accessible by direct index. While this question has a List
and is "directly indexable" only from a viewpoint of "well, List#get
exists, technically". If the results collection above is a LinkedList, for example, then calling an O(n) get
N times with nonconsecutive index would be... bad.)
Are there other, better, possibilities that I'm missing? I thought a custom Collector
might do it, but I can't figure out how to maintain the state there either and never even got as far as scratch code.