Scala best way to find pairs in a collection

Question

I'm trying to find the most optimal way of finding pairs in a Scala collection. For example,

val list = List(1,2,3)

should produce these pairs

(1,2) (1,3) (2,1) (2,3) (3,1) (3,2)

My current implement seems quite expensive. How can I further optimize this?

val pairs = list.flatMap { currentElement =>
      val clonedList: mutable.ListBuffer[Int] = list.to[ListBuffer]
      val currentIndex = list.indexOf(currentElement)
      val removedValue = clonedList.remove(currentIndex)

      clonedList.map { y =>
        (currentElement, y)
      }
    }

When you say "optimize" what for do you want it to be optimized? For example what is more important CPU or memory? Another question is what the typical size of the source list? (is it tens of elements? thousands? millions?) — SergGr, Dec 12 '17 at 19:32
@Tom we have to ignore duplicates so that current element is not paired with itself — Af Si, Dec 12 '17 at 21:08

Oli · Accepted Answer · 2017-12-13T21:45:19.900

0

val l = Array(1,2,3,4)
val result = scala.collection.mutable.HashSet[(Int, Int)]()
for(i <- 0 until l.size) {
    for(j<- (i+1) until l.size) {
        result += l(i)->l(j)
        result += l(j)->l(i)
    }
}

Several optimizations here. First, with the second loop, we only traverse the list from the current element to the end, dividing the number of iterations by two. Then we limit the number of object creations to the minimum (Only tuples are created and added to a mutable hashset). Finally, with the hashset you handle the duplicates for free. An additional optimization would be to check if the set already contains the tuple to avoid creating an object for nothing.

For 1,000 elements, it takes less than 1s on my laptop. 7s for 10k distinct elements.

Note that recursively, you could do it this way:

def combi(s : Seq[Int]) : Seq[(Int, Int)] = 
    if(s.isEmpty)
         Seq()
    else
        s.tail.flatMap(x=> Seq(s.head -> x, x -> s.head)) ++ combi(s.tail)

It takes a little bit more than 1s for 1000 elements.

edited Dec 13 '17 at 21:45

answered Dec 12 '17 at 20:53

Oli

9,766
5
25
46

This doesn't produce the desired result. scala.collection.mutable.HashSet[(Int, Int)] = Set((3,0), (1,2), (0,3), (0,2), (3,1), (2,0), (1,0), (2,1), (0,1), (1,3), (3,2), (2,3)) – Af Si Dec 12 '17 at 21:20
True, I was returning the indicess instead of the corresponding element in the list. I made a small edit to correct it. – Oli Dec 13 '17 at 05:27
I'm trying to figure out a recursive solution for this problem rather than O(n^2) complexity – Af Si Dec 13 '17 at 19:26
I added a recursive version but the complexity is still O(n^2). Note that since the size of the output is in O(n^2) of the size of the input, you cannot do better than a complexity of O(n^2)... – Oli Dec 13 '17 at 21:47
Your recursive version is not tail-recursive. – igorpcholkin Dec 14 '17 at 12:20

igorpcholkin · Answer 2 · 2017-12-13T10:02:42.027

-1

Supposing that "most optimal way" could be treated differently (e.g. most of time I treat it the one which allows myself to be more productive) I suggest the following approach:

val originalList = (1 to 1000) toStream
def orderedPairs[T](list: Stream[T]) = list.combinations(2).map( p => (p(0), p(1)) ).toStream
val pairs = orderedPairs(originalList) ++ orderedPairs(originalList.reverse)
println(pairs.slice(0, 1000).toList)

edited Dec 13 '17 at 10:02

answered Dec 12 '17 at 20:16

igorpcholkin

947
7
15

Every time you use map or toList or something like that, you copy the entire list. Same thing goes for "++" by the way. This approach already takes 8s on my laptop for n=1000. – Oli Dec 13 '17 at 07:12
I have made slight adjustments which should address performance issues as noted above. – igorpcholkin Dec 13 '17 at 10:07

Scala best way to find pairs in a collection

2 Answers2