0

I'm confused on this situation:

  • I've a Producer which produces an undetermined number of items from an underlining iterator, possibly a large number of them.
  • Each item must be mapped to a different interface (eg, wrapper, JavaBean from JSON structure).
  • So, I'm thinking that it would be good for Producer to return a stream, it's easier to write code that convert Iterator to Stream (using Spliterators and StreamSupport.stream()), then apply Stream.map() and return the final stream.
  • The problem is I have an invoker that does nothing with the resulting stream, eg, a unit test, yet I still want the mapping code to be invoked for every item. At the moment I'm simply calling Stream.count() from the invoker to force that.

Questions are:

  • Am I doing it wrong? Should I use different interfaces? Note that I think implementing next()/hasNext() for Iterator is cumbersome, mainly because it forces you to create a new class (even if it can be anonymous) and keep a pointer and check it. Same for collection views, returning a collection that is created and not a dynamic view over the underlining iterator is out of question (the input data set might be very large). The only alternative I like so far is a Java implementation of yield(). Neither do I want the stream to be consumed inside Producer (ie, forEach()), since some other invoker might want it to perform some real operation.
  • Is there a better best practice to force the stream processing?
Community
  • 1
  • 1
zakmck
  • 2,715
  • 1
  • 37
  • 53
  • I don't understand this part: `I have an invoker that does nothing with the resulting stream, eg, a unit test, yet I still want the mapping code to be invoked for every item`. Why don't you simply call `Stream.map()` – JohnnyAW May 02 '17 at 11:39
  • 1
    Streams are lazy. Mapping, filtering etc (i.e. intermediate operations) only occur once a terminal operation is invoked on the stream (i.e. count, collect, forEach, etc). I don't understand your problem. Does the function you are invoking on map have a lateral effect? If yes, then you're not using streams well, because functions invoked on map can't have lateral effects, according to the spec. – fps May 02 '17 at 13:22
  • @federico-peralta-schaffner, yes, map() has a side effect, as a minimum, in my case it also updates other data structures (a parent that owns the items that the initial data are mapped to), which I'm trying to verify on the invoker. My problem is to understand the best and most common way to do these things. – zakmck May 02 '17 at 14:05
  • @johnnyaw, no, I can't call Stream.map() from an external code like a unit test, because I get back a stream where map() has been already called by the Producer, the problem, as Federico says, is that invoking map doesn't mean the function it receives is immediately invoked on each stream item, rather, this has to be deferred to when the stream is consumed with a terminal operation. – zakmck May 02 '17 at 14:05
  • @zakmck streams discourage side effects, so you shouldn't use `Stream` – JohnnyAW May 02 '17 at 14:11
  • btw. I would like to know how is your `Producer` is going to store all that generated data? – JohnnyAW May 02 '17 at 14:26
  • Then, I wonder what is the next best thing in such a case. Producer doesn't store anything, it gets an iterator from another component and remap the items coming from it to a different interface, then it returns a view on a these new interface instances, and I really need that it doesn't store the remapped results. – zakmck May 02 '17 at 15:22
  • `then it returns a view on a these new interface instances` do you realize, that the returned view has to be backed up by some sort of model-data? So basically you have 2 choices: 1: you perform all your tasks as soon as you remap a model(in this case you don't need to store the remapped model and gc can collect the remapped object right after the task) or 2: create a new collection with remapped objects and use it for further tasks. – JohnnyAW May 02 '17 at 15:35
  • Obviosuly I'm trying to achieve 1) and to avoid 2), for that could require much more RAM. – zakmck May 02 '17 at 15:51
  • so why even bother with streams? simply set a task listener to your `Producer` and call a function with your remapped object... – JohnnyAW May 02 '17 at 16:24
  • 1
    This looks very much like an [xy problem](http://meta.stackexchange.com/a/66378/166789). You say, you “*want the mapping code to be invoked for every item*” despite you don’t want to keep the resulting objects, which implies that the mapping function has a side effect, which you want to enforce. The obvious solution is to separate the mapping function and the side effect. – Holger May 02 '17 at 16:25
  • 1
    To rephrase your requirement: you want something that is lazy when a consumer queries the elements, but eagerly perform the mapping function’s side effect for the remaining elements, when the consumer does not query them. In other words, you need something that looks into the future to know whether the consumer will query the remaining elements or not. I hope, phrasing it this way helps understanding that it doesn’t matter whether you use a `Stream`, an `Iterator` or a `yield` like construct. None of them will solve this kind of task. – Holger May 02 '17 at 17:17
  • I guess in the situation I am I need to consume either of these views, doing with Stream.count() at the moment, not sure that's the best thing to do, but thanks. – zakmck May 02 '17 at 17:31
  • There is no requirement for `Stream.count()` to process mapping functions, as they have no impact on the result, and there is already an update of the Stream implementation (currently scheduled for Java 9), which will skip the processing if it can predict the number of elements without it. As @Federico Peralta Schaffner said before, functions with side effects and Streams don’t play well together. You should rethink you design… – Holger May 02 '17 at 18:49
  • @Holger You are allright, meaning that `map` is called upon `count` terminal operation invocation makes no sense. My mistake. – fps May 02 '17 at 19:07
  • OK, I should review the design. But suppose I want to force the stream consumption because of some foo reason, such as making a unit test to work. What is the best way, if not count()? forEach ( () -> {} )? – zakmck May 03 '17 at 09:58

0 Answers0