0

I have a big size of a vector (about 2000 elements), inside consists of many tuples, Tuple(Int,Int), i.e.

val myVectorEG = Vector((65,61), (29,49), (4,57), (12,49), (24,98), (21,52), (81,86), (91,23), (73,34), (97,41),...))

I wish to remove the repeated/duplicated integers for every tuple at the index (0), i.e. if Tuple(65,xx) repeated at other Tuple(65, yy) inside the vector, it should be removed)

I enable to access them and print out in this method:

val (id1,id2) = ( allSource.foreach(i=>println(i._1)),  allSource.foreach(i=>i._2))

How can I remove duplicate integers? Or I should use another method, rather than using foreach to access my element index at 0

Seth Tisue
  • 29,985
  • 11
  • 82
  • 149
Marzack
  • 47
  • 8

4 Answers4

3

To remove all duplicates, first group by the first tuple and only collect the tuples where there is only one tuple that belongs to that particular key (_._1). Then flatten the result.

myVectorEG.groupBy(_._1).collect{
  case (k, v) if v.size == 1 => v
}.flatten

This returns a List which you can call .toVector on if you need a Vector

CervEd
  • 3,306
  • 28
  • 25
3

This does the job and preserves order (unlike other solutions) but is O(n^2) so potentially slow for 2000 elements:

myVectorEG.filter(x => myVectorEG.count(_._1 == x._1) == 1)

This is more efficient for larger vectors but still preserves order:

val keep =
  myVectorEG.groupBy(_._1).collect{
    case (k, v) if v.size == 1 => k
  }.toSet

myVectorEG.filter(x => keep.contains(x._1))
Tim
  • 26,753
  • 2
  • 16
  • 29
2

You can use a distinctBy to remove duplicates.

In the case of Vector[(Int, Int)] it will look like this

myVectorEG.distinctBy(_._1)

Updated, if you need to remove all the duplicates:

You can use groupBy but this will rearrange your order.

myVectorEG.groupBy(_._1).filter(_._2.size == 1).flatMap(_._2).toVector
Ivan Stanislavciuc
  • 7,140
  • 15
  • 18
1

Another option, taking advantage that you want the list sorted at the end.

def sortAndRemoveDuplicatesByFirst[A : Ordering, B](input: List[(A, B)]): List[(A, B)] = {
  import Ordering.Implicits._

  val sorted = input.sortBy(_._1)

  @annotation.tailrec
  def loop(remaining: List[(A, B)], previous: (A, B), repeated: Boolean, acc: List[(A, B)]): List[(A, B)] =
    remaining match {
      case x :: xs =>
        if (x._1 == previous._1)
          loop(remaining = xs, previous, repeated = true, acc)
        else if (!repeated)
          loop(remaining = xs, previous = x, repeated = false, previous :: acc)
        else
          loop(remaining = xs, previous = x, repeated = false, acc)

      case Nil =>
        (previous :: acc).reverse
    }

  sorted match {
    case x :: xs =>
      loop(remaining = xs, previous = x, repeated = false, acc = List.empty)

    case Nil =>
      List.empty
  }
}

Which you can test like this:

val data = List(
  1 -> "A",
  3 -> "B",
  1 -> "C",
  4 -> "D",
  3 -> "E",
  5 -> "F",
  1 -> "G",
  0 -> "H"
)

sortAndRemoveDuplicatesByFirst(data)
// res: List[(Int, String)] = List((0,H), (4,D), (5,F))

(I used List instead of Vector to make it easy and performant to write the tail-rec algorithm)