0

As we know that set is implemented using red-black trees so inserting an element will be a task of O(log N) complexity. But if we are given a vector of n distinct integer so making a set out of it should by NlogN task. But in many places, I found it is given as O(N). Can someone please help me to resolve this. Thanks in advance.

Link to the place where I read this Complexity of building a set from list?

  • 4
    Why do you think it's O(n)? Constructing a set from a vector is O(n*log n). – local-ninja Jul 27 '20 at 17:33
  • 3
    @fdan that's the question, why does someone claim it's O(n) when it's not obvious why it would be the case? A link to one of those "many places" would have helped. – Mark Ransom Jul 27 '20 at 17:35
  • You're going to have to be more specific about any particular example which you think you are describing. It could be that your case doesn't meet the terms set forth in this question, or it could be a best-case scenario (sorting an array which is already sorted) in which case it's irrelevant. – Kenny Ostrom Jul 27 '20 at 17:53
  • A heap(priority queue) takes `O(N)` time to build provided you've all `N` elements readily available. Set takes `O(NlogN)` no matter what. – srt1104 Jul 27 '20 at 18:05
  • 4
    `But in many places, I found it is given as O(N)` We can not tell you about situations that we are unaware of. Please name (link) these many places (or at least the top one). Maybe there are more constraints on the input that you are glossing over? – Martin York Jul 27 '20 at 18:05
  • 2
    The claim is incorrect. If we could construct a set from an *unsorted* vector in `O(N)`, we would get `O(N)` sorting algorithm. – Evg Jul 27 '20 at 18:05
  • 1
    "As we know ..." my understanding is rather the reverse: Inserting an element in to a `set` is specified to be a `O(log N)` operation and a red-black tree can achieve that. – 463035818_is_not_an_ai Jul 27 '20 at 18:08
  • Inserting elements in a heap is O(logN) too, but creating a heap from an array can be done in O(n), although the mathematical proof is a bit long. The theory behind it is that the total number of operations to be done is `n * sum( i / 2^i ) from i = 1 ... logN`, which is less than `2n`, and hence, bounded in O(n). – Daniel Jul 27 '20 at 18:25
  • "But in many places" Provide links to one or more such places. – n. m. could be an AI Jul 27 '20 at 18:27
  • @n.'pronouns'm. Please find the link after edit. – khemukaanmol Jul 28 '20 at 10:07
  • 1
    That place is talking about hash tables, not trees. One down, how many to go? – n. m. could be an AI Jul 28 '20 at 11:17
  • so can set be implemented using hashtable also? – khemukaanmol Jul 28 '20 at 11:34
  • yes, it can be implemented with a hash table. – n. m. could be an AI Jul 28 '20 at 16:19

1 Answers1

4

If the vector is already sorted, then constructing the set is indeed O(N) (but in general not!).

From cppreference:

template< class InputIt >    
set( InputIt first, InputIt last,
     const Compare& comp = Compare(),
     const Allocator& alloc = Allocator() );

Range constructor. Constructs the container with the contents of the range [first, last). If multiple elements in the range have keys that compare equivalent, it is unspecified which element is inserted

Complexity

N log(N) where N = std::distance(first, last) in general, linear in N if the range is already sorted by value_comp().

The relevant quote from the standard is:

22.4.6.2#4 Complexity: Linear in N if the range [first, last) is already sorted using comp and otherwise NlogN, where N is last - first.

The standard only specifies what containers do, not how they accomplish that. Using a red-black tree is typical but not necessary to implement a std::set.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • How is it linear? Can someone link me an explanation I didn't find anything related to this. – srt1104 Jul 27 '20 at 19:12
  • 1
    @srt1104 I couldn't justify it with mathematics but essentially the operation is linear with a sorted vector because you can use the position that the previous element was inserted at as a hint to insert the next element more quickly. – john Jul 27 '20 at 19:14
  • Still the tree will have to reorder itself sometime and that's a logN operation. – srt1104 Jul 27 '20 at 19:17
  • @srt1104 Cleverer people than me have said it is O(N) including any rebalancing, in fact the C++ standard guarantees it. – john Jul 27 '20 at 19:18
  • Note it is only O(N) for constructing a new set, inserting into an existing set is still O(N log(N)) – john Jul 27 '20 at 19:20
  • @srt1104 no, you put the values directly where they should end up. Only if you find the input is unordered do you have to reorder what you've already added – Caleth Jul 28 '20 at 10:14
  • @john appending a sorted sequence to an existing set is also O(N) in existing implementations, because they just reuse the same algorithm as for constructing from a sorted sequence (repeatedly calling hinted set::insert with end() as the hint) – Cubbi Jul 30 '20 at 21:11
  • @Cubbi I know that, but I recall reading that isn't O(N). The point of what I read was that it was originally assumed that it was O(N) but then someone with a better grasp of the memthemtics pointed out it wasn't and the standard had to be revised. But my recollection might be faulty, this was a long time ago. If anyone has access to the standard now I'd be interested to know. – john Jul 31 '20 at 06:14
  • @Cubbi So [cppreference](https://en.cppreference.com/w/cpp/container/map/insert) says this `Amortized constant if the insertion happens in the position just before the hint, logarithmic in the size of the container otherwise.` I think the point is that if the map already contains elements then you can't guarantee that insertion happens in the position just before the hint. So a single insertion is O(logN) and presumably N insertions are O(NlogN). – john Jul 31 '20 at 06:19