2

What's the difference between the two implementations? Is one better than the other. There is a blog post that says that Tuple2Zipped performances better, but it doesn't provide a reason, and looking at the source code I don't see the difference.

val l1 = List(1,2,3)
val l2 = List(5,6,7)

val v1 = l1 zip l2
val v2 = (l1, l2).zipped
EugeneMi
  • 3,475
  • 3
  • 38
  • 57

1 Answers1

6

In case it's not obvious, the values and types of v1 and v2 differ: v1 has type List[(Int, Int)] with the value List((1, 5), (2, 6), (3, 7)); v2 has type scala.runtime.Tuple2Zipped[Int, List[Int], Int, List[Int]] and has the value (List(1, 2, 3), List(5, 6, 7)).zipped.

Put another way, the value of v1 was computed strictly (the zip operation has already been completed), while v2 was computed lazily (or non strictly)—in effect, the zip operation has been stored, but not yet executed.

If all you want to do is calculate these two values (but not actually use them), then I would indeed expect v2 to be calculated faster, because it's not actually doing a great deal of work. ;-)

Beyond that, it will depend upon how you subsequently intend to use these values. Tuple2Zipped will perform better if you do not need to process every tuple in the resulting list, since it will not waste time zipping list elements that you do not require. It might possibly have an edge if you need to apply some operation to each tuple, but do not need access to them post processing, thereby having a single pass through the list.

The List.zip method will likely be the better choice if you need to perform multiple operations on the list members, iterating through it multiple times.

Both approaches will work in all cases. (In the general case, I would prefer List.zip if only because Tuple2Zipped is less well known, and its use would hint at a special requirement.)

If performance is truly a concern, then I recommend benchmarking the two approaches with your code, using a tool such as ScalaMeter and accurately differentiating the two. I would also recommend benchmarking memory usage, as well as processing time, since the two approaches have differing memory requirements.

UPDATE: Referencing additional question in comments below: "Is there a difference between val m:Map[Int, Int] = (l1 zip l2)(breakOut) and (l1, l2).zipped.toMap?

I'll restate this as follows:

import scala.collection.breakOut

val l1 = List(1, 2, 3)
val l2 = List(5, 6, 7)

// m1's type has to be explicit, otherwise it is inferred to be
// scala.collection.immutable.IndexedSeq[(Int, Int)].
val m1: Map[Int, Int] = (l1 zip l2)(breakOut)
val m2 = (l1, l2).zipped.toMap

There's no such thing as a lazy Map, since all of the elements in the map need to be available in order to structure the map internally, thereby allowing values to be efficiently retrieved when performing a key lookup.

Consequently, the differentiation between the strictly-evaluated (l1 zip l2) and the lazily-evaluated (l1, l2).zipped disappears in the act of conversion to a Map.

So which is more efficient? In this particular example, I would expect that the two approaches perform very similarly.

When calculating m1, the zip operation iterates through l1 and l2 examining a pair of head elements at a time. The breakOut builder (see also link in comment below), and the declared result type of Map[Int, Int], causes the zip operation to build a Map as its result (without breakOut, zip would result in a List[(Int, Int)]).

In summarizing this approach, the resulting map is created via a single, simultaneous pass through l1 and l2.

(The use of breakOut does make a difference. If we generated the map as (l1 zip l2).toMap, then we perform one iteration through l1 and l2 to create a List[(Int, Int)], then iterate on that list to create the resulting Map; this is clearly less efficient.

In the new Scala 13 collections API, breakOut has been removed. But there are new alternatives which work better from a type perspective. See this document for more details.)

Now let's consider m2. In this case, as previously stated, (l1, l2).zipped results in a lazy list of tuples. However, to this point, no iterations have yet been performed on either input list. When the toMap operation executes, each tuple in the lazy list is evaluated when first referenced and is added the map being built.

In summarizing this approach, again, the resulting map is created via a single, simultaneous pass through l1 and l2.

So, in this particular use case, there's going to be very little difference between the two approaches. There may still be minor implementation details that affect the result, so if you have a huge amount of data in l1 and l2, you may still want to benchmark them to find the best solution. However, I'd be inclined to simply pick the zip operation (with breakOut) and leave it at that.

Mike Allen
  • 8,139
  • 2
  • 24
  • 46
  • My use case is turning the tuples into a map. Is there a difference between val m:Map[Int, Int] = (l1 zip l2)(breakOut) and (l1, l2).zipped.toMap? – EugeneMi Jun 11 '19 at 14:39
  • @EugeneMi Interesting question! There is no such thing as a _lazy map_, since the ability to lookup a value through a key requires knowledge of the entire collection. So, when you convert a lazy collection to a map, it has to be evaluated strictly. Consequently, although the `Tuple2Zipped` instance is lazy, every tuple has to be read to convert it to a map. So, in this case, both approaches result in `Map(1 -> 5, 2 -> 6, 3 -> 7)`. The only question is, which will be the fastest? I doubt there will be a lot of difference, but it wouldn't hurt to benchmark the two if that matters to you. – Mike Allen Jun 11 '19 at 15:09
  • @EugeneMi I'm not sure what you mean by `(l1 zip l2)(breakOut)`, though. I'm assuming that's equivalent to `(l1 zip l2).toMap`. – Mike Allen Jun 11 '19 at 15:19
  • 1
    https://docs.scala-lang.org/tutorials/FAQ/breakout.html. It skips creating a intermediate list of tuples – EugeneMi Jun 11 '19 at 19:51
  • @EugeneMi Sweet. I've been using _Scala_ for over 10 years, and I don't recall ever seeing `breakOut` before! Thanks for the link! (Oddly, it doesn't show up if you search for it in the _ScalaDoc_'s search bar.) In any case, I've updated my answer to address the question in your first comment. – Mike Allen Jun 12 '19 at 13:18