Most parallelizable way of getting rid of duplicate elements in a scala vector?

Question

Given that I have a vector with a lot of elements, and many of them repeated, and I need to get a vector with all the duplicate elements removed. For now, my implementation involves applying the toSet method and then the toVector method. However, this is rather slow for very large vectors. I was considering using ParVectors instead, and doing the same thing on that, but will that actually give me any performance improvement. I'm looking for some kind of solution that will scale up with the number of cores I have, some sort of parallelizable code. I'd be glad if someone could suggest something helpful. Thanks. :)

I hate to say this, but it depends. Why not do some profiling (refer to this: http://stackoverflow.com/questions/9160001/how-to-profile-methods-in-scala) according to your use case to see the difference ? — Sudheer Aedama, Jul 08 '15 at 18:36
Yes, you could try `ParVector`. You might also try one of the View collections (`SeqView`, `StreamView`, etc.) which can offer better performance depending on how the collection is used/processed. All of these offer the `distinct` method which is a good bet for removing duplicates. — jwvh, Jul 09 '15 at 02:13

Most parallelizable way of getting rid of duplicate elements in a scala vector?

0 Answers0