In case it's not obvious, the values and types of v1
and v2
differ: v1
has type List[(Int, Int)]
with the value List((1, 5), (2, 6), (3, 7))
; v2
has type scala.runtime.Tuple2Zipped[Int, List[Int], Int, List[Int]]
and has the value (List(1, 2, 3), List(5, 6, 7)).zipped
.
Put another way, the value of v1
was computed strictly (the zip
operation has already been completed), while v2
was computed lazily (or non strictly)—in effect, the zip operation has been stored, but not yet executed.
If all you want to do is calculate these two values (but not actually use them), then I would indeed expect v2
to be calculated faster, because it's not actually doing a great deal of work. ;-)
Beyond that, it will depend upon how you subsequently intend to use these values. Tuple2Zipped
will perform better if you do not need to process every tuple in the resulting list, since it will not waste time zipping list elements that you do not require. It might possibly have an edge if you need to apply some operation to each tuple, but do not need access to them post processing, thereby having a single pass through the list.
The List.zip
method will likely be the better choice if you need to perform multiple operations on the list members, iterating through it multiple times.
Both approaches will work in all cases. (In the general case, I would prefer List.zip
if only because Tuple2Zipped
is less well known, and its use would hint at a special requirement.)
If performance is truly a concern, then I recommend benchmarking the two approaches with your code, using a tool such as ScalaMeter and accurately differentiating the two. I would also recommend benchmarking memory usage, as well as processing time, since the two approaches have differing memory requirements.
UPDATE: Referencing additional question in comments below: "Is there a difference between val m:Map[Int, Int] = (l1 zip l2)(breakOut)
and (l1, l2).zipped.toMap
?
I'll restate this as follows:
import scala.collection.breakOut
val l1 = List(1, 2, 3)
val l2 = List(5, 6, 7)
// m1's type has to be explicit, otherwise it is inferred to be
// scala.collection.immutable.IndexedSeq[(Int, Int)].
val m1: Map[Int, Int] = (l1 zip l2)(breakOut)
val m2 = (l1, l2).zipped.toMap
There's no such thing as a lazy Map
, since all of the elements in the map need to be available in order to structure the map internally, thereby allowing values to be efficiently retrieved when performing a key lookup.
Consequently, the differentiation between the strictly-evaluated (l1 zip l2)
and the lazily-evaluated (l1, l2).zipped
disappears in the act of conversion to a Map
.
So which is more efficient? In this particular example, I would expect that the two approaches perform very similarly.
When calculating m1
, the zip
operation iterates through l1
and l2
examining a pair of head elements at a time. The breakOut
builder (see also link in comment below), and the declared result type of Map[Int, Int]
, causes the zip
operation to build a Map
as its result (without breakOut
, zip
would result in a List[(Int, Int)]
).
In summarizing this approach, the resulting map is created via a single, simultaneous pass through l1
and l2
.
(The use of breakOut
does make a difference. If we generated the map as (l1 zip l2).toMap
, then we perform one iteration through l1
and l2
to create a List[(Int, Int)]
, then iterate on that list to create the resulting Map
; this is clearly less efficient.
In the new Scala 13 collections API, breakOut
has been removed. But there are new alternatives which work better from a type perspective. See this document for more details.)
Now let's consider m2
. In this case, as previously stated, (l1, l2).zipped
results in a lazy list of tuples. However, to this point, no iterations have yet been performed on either input list. When the toMap
operation executes, each tuple in the lazy list is evaluated when first referenced and is added the map being built.
In summarizing this approach, again, the resulting map is created via a single, simultaneous pass through l1
and l2
.
So, in this particular use case, there's going to be very little difference between the two approaches. There may still be minor implementation details that affect the result, so if you have a huge amount of data in l1
and l2
, you may still want to benchmark them to find the best solution. However, I'd be inclined to simply pick the zip
operation (with breakOut
) and leave it at that.