2

Given two SortedSet[Int] or TreeSet[Int] there is any implementation that is doing this in an efficient way O(n) like explained here ? What about reunion?

There is any special reason for not having that in the standard library? A normal reason might be that it didn't have enough priority.

Community
  • 1
  • 1
raisercostin
  • 8,777
  • 5
  • 67
  • 76

2 Answers2

0

Can't be done in O(n). The algorithm you mention is very simple to implement for two sorted lists (to get another sorted list), since insertion to the result list is O(1), so one traversal of the inputs in O(n) (single pass). For something like a TreeSet, the tree itself has to be built, that means O(log(n)) insertion time. So the best you can conceptually get is O(N log(N)).

As for it being implemented, Its very likely ++ method does exactly this, since its optimal and a well-known solution.

If you want to do this, is fairly simple, here is a small snippet of code I use to merge an arbitrary set of sorted files:

   case class Input(i: Iterator[String], var lastKey: Option[K], var lastValue: String, extraction: String => (K, String), var count: Int =0)
   var sources = List[Input]()

... and later...

     // Read a first value for all sources
     sources.foreach(readKey(_))

     var moreInput = true

     while(moreInput)
     {
        // Find the source with the smallest key
        var smallest: Input = null
        for (source <- sources)
           if (source.lastKey.isDefined)
              if (smallest == null || smallest.lastKey.get > source.lastKey.get)
                 smallest = source

        // Check if there is anything to be done.
        if (smallest == null)
           moreInput = false
        else
        {
           // Output the chosen line and advance the corresponding iterator.
           out.println(smallest.lastValue)
           readKey(smallest)

           val index = sources.indexOf(smallest)
        }
     }
Daniel Langdon
  • 5,899
  • 4
  • 28
  • 48
  • 1
    In case you have a SortedSet the sorting price was already paid. So given two of them (for which you already paid the sorting time in O(NlogN) you should be able to do the intersection in O(N). – raisercostin Mar 19 '15 at 14:50
  • 1
    No. I'm not considering the sorting time at all. As I said, it is indeed O(N) if you are merging to a *list* but since you want your result to be a *treeset* insertions to it have a cost too (O(logN)) for each of the N insertions. Even if you are inserting N random numbers you came up with. Check this: http://docs.scala-lang.org/overviews/collections/performance-characteristics.html – Daniel Langdon Mar 19 '15 at 15:05
0

Given two generic Sets you must iterate on one in O(n) and for each item do a search in the other that is O(1) (or eC effectively Constant). In the end you will still get O(N) and besides you didn't need to sort the sets before.

In the end a SortedSet doesn't help much on intersections/reunions.

raisercostin
  • 8,777
  • 5
  • 67
  • 76