11

I can think of three operations in C++ that can be described in some sense as having 'constant' complexity. I've seen some debate(*) over what this means, and it seems to me that we could just say "all these operations are constant, but some are more constant than others" :-)

(Edit 2: If you already think you know the answer, please read some of the debate at this question before rushing in too soon: What data structure, exactly, are deques in C++? Many people, with quite high scores, are arguing over what 'constant' means. My question isn't as obvious as you might think!)

(Edit: I don't need a primer in what 'complexity' means. I want a clear answer, perhaps with quotes from the C++ standard, that tells us exactly what is supposed to be constant. Processor ticks, real-world time, or something else? On other threads, some people have argued that time is totally irrelevant to what is required by the C++ standard.)

Do these complexity guarantees in the standard apply to the runtime of the operation? Or do they simply specify the (maximum) number of copies/moves that might happen of the the contained objects? And what exactly does 'amortized' mean?

1. Given a (non-empty) vector<int> v, the following is clearly constant in runtime:

swap(v.front(), v.back());

(although a pedant might point out that it depends on whether the data is in the cache or swapped out or whatever!).

2. Given a list<int> l, doing a push_back is straightforward. Exactly one new item is allocated and a few pointers in the linked list are shuffled. Each push_front involves one allocation, always of the same amount of memory, so this is clearly fairly 'constant'. But, of course, the time to do the allocation could be quite variable. The memory-management might have to take a lot of time to find some suitable free memory.

3. But doing a push_back on a vector<int> is even more unpredictable. Most of the time, it will be very quick, but every now and then it will have to reallocate space for all the data and copy every element to a new location. So it's less predictable in terms of runtime than a single list::push_front, but it's still referred to as constant (amortized). On average, adding a lot of data to a vector will take a complexity that is independent of the amount added, and this is why it's called 'amortized constant' time. (Am I right?)

And finally, I asked about int to avoid the complexities of having another type. For example, a vector< vector<int> > might be a little more complicated to reason about because each element of the vector (of vectors) might have a different size and, for example, swapping two elements is not quite as constant as in case 1. above. But ideally, we could answer for all vector<T>, not just T=int.

(*) For an example of the debates, see the comments to these answers: What data structure, exactly, are deques in C++?

Community
  • 1
  • 1
Aaron McDaid
  • 26,501
  • 9
  • 66
  • 88
  • Some context for those giving answers: this question was spawned by the topic of whether `deque::push_back` has constant complexity if its implemented in the usual fashion as an array of arrays. Also, the clause "All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects." appearing in the C++ standard. (or, at least, a draft of it) –  Dec 25 '11 at 21:23
  • welcome @Hurkyl , and thanks for helping to clarify this. – Aaron McDaid Dec 25 '11 at 21:33
  • @NicolBolas, with all due respect, I am well aware of those things. Based on the long debates involving many people on the other threads, you could see this isn't just a stupid question. We can't all be idiots. Don't be so patronising. – Aaron McDaid Dec 25 '11 at 23:37
  • @NicolBolas, I'm tempted to write much more, but I don't want this thread to degenerate into the flame wars I've seen elsewhere. So I'll try to leave it at that. – Aaron McDaid Dec 25 '11 at 23:38
  • @AaronMcDaid I wish this question dealt only with what seems to have originated it. That is, why does it seem that the standard, specifically in the case of deque insertions in front and back, measures complexity in the number of T constructors called. This seems to be quite a deviation from how most people would define the complexity, which is in terms of the number of items in the deque. In short the standard says deque has constant time insertion (front/back), which is impossible if you measure it in reference to the number of items in the deque. – dcmm88 Nov 17 '16 at 05:22

6 Answers6

11

Complexity is always stated relative to a specific variable or set of variables. So when the standard talks about constant time insertion, it is talking about constant time relative to the number of items in the list. That is, O(1) insertions means that the number of items currently in the list does not affect the overall complexity of insertions. The list could have 500 or 50000000 items in it, and the complexity of the insertion operation will be the same.

For example, std::list has O(1) insertions and deletions; the complexity of insertions is not affected by the number of elements in the list. However, the memory allocator's complexity may well depend on the number of things that have already been allocated. But since the O(1) is talking about number of items in the list, it doesn't cover that. Nor is it supposed to, because then we would be measuring the complexity of the memory allocator, not the data structure.

In short: that's a different measurement.

It implies that we are free to implement our algorithm as badly as we like, including one where the time isn't really constant in any pragmatic sense, but where we have respected the number of 'operations' on the contained objects.

Complexity is not specified relative to implementations. It is specified relative to algorithms. It doesn't matter that a context could switch, because running time is not what complexity is for.

As above, you could implement std::list with a memory allocator that is O(log(n)) with respect to deletions (where n is the number of allocations). However, the deletion of an element in the list would still be O(1) with respect to the number of items in the list.

Do not confuse complexity with overall performance. The purpose of complexity is to have a general metric for algorithms with respect to different variables. The purpose of a programmer, who wants their code to run fast, is to find a reasonable implementation of an algorithm that conforms to the complexity that they need to achieve that performance.

Complexity is a tool for rating the efficiency of an algorithm. Complexity does not mean you get to stop thinking.

And what exactly does 'amortized' mean?

Amortized means this:

If something is "amortized X time", then as you repeat the operation infinitely many times on the same data structure, the complexity limits at X.

So, std::vector has "amortized constant time" insertions at the rear. So if we take the object and perform infinitely many insertions on it, the asymptotic limit of complexity will be no different than a "constant time" insertion.

In laymans terms, it means that the operation may sometimes be non-constant, but the number of times it will be non-constant will always be decreasing. Over the long run of insertions, it's constant time.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • So it's not very clear to just say "X is constant". People should always specify boths sides of the complexity relationships. "X is O(1) with respect to Y". Where X is time (or maybe space). And Y is some specific variable such as 'number of items in the container already'. The complexity is permitted to depend on many variables, but it's not allowed to depend on the number of items in the container already? It's not 'constant', but 'constant with respect to one particular variable'. Correct? – Aaron McDaid Dec 25 '11 at 22:36
  • 1
    @AaronMcDaid: I'd say that "X is constant" can be clear, within the context it is given. The C++ standard, for example, clearly states the context with respect to its complexity specifications. The complexity can be specified relative to any relevant variable, including the number of items in the container. BTW, X is not time; it is the operation in question. "Insertion is O(1) with respect to the number of items in the container." – Nicol Bolas Dec 25 '11 at 22:41
  • Imagine we're talking about space complexity, and we want to specify that the amount of extra memory required to add one extra item is independent of the number of items already in the container. How would we state that? We couldn't just say "Insertion is O(1) with respect to the number of items in the container." as we've already used those words to specify the other, non-space, complexity. – Aaron McDaid Dec 25 '11 at 22:55
  • @Aaron: If you're talking about space complexity, you'd tend to state that explicitly unless the context is clear. Generally speaking, when people say that something is `O(f)`, for some relevant `f`, they're talking about time. If that's not what they mean, they'll generally say. In this specific instance, you could say something like "The amount of space required to insert a new item into the container is `O(1)` with respect to the number of items already in the container." Or you could just say "Insertion requires `O(1)` space", and assume people can probably guess what you mean. – Stuart Golodetz Dec 25 '11 at 23:11
  • 1
    So there are two sides to the complexity relationship? "X is O(...) with respect to Y". But it's usually clear from the context what X and Y are? X typically isn't space, unless that's specified? And Y is usually fairly obvious too, and the C++ standard make clears what Y is with "All of the complexity requirements in this clause are stated solely *in terms of the number of operations on the contained objects.*" (my emphasis) – Aaron McDaid Dec 25 '11 at 23:33
  • IMO the interpretation of the standard (I presume of the phrase "solely in terms of the number of operations on the contained objects" cited at the end of the answer by Vlad) in the highlighted paragraph of this answer is very artificial. The phrase wants to state that for the purposes of complexity, a single operation on a contained object should count as one unit, though it could require a number of elementary operations depending on the type or value of that contained object. It does not say _other_ operations do not count: if so, one could also give `std::list` constant time random access! – Marc van Leeuwen Oct 15 '17 at 12:17
  • @MarcvanLeeuwen: How can you give `list` O(1) random access and still give it O(1) insertion and removal at any location? – Nicol Bolas Oct 15 '17 at 13:57
  • Yes, if you count complexity "solely in terms of the number of operations on the contained objects". Keep the usual implementation, then define `L[i]` as `*(std::next(L.begin(),i))`. Obviously `std::next` is calling `i` times the operator++, but that is **not an operation on contained objects**, just playing with link fields that are part of the `std::list` implementation. I thought the "implement our algorithm as badly as we like, including one where the time isn't really constant in any pragmatic sense" of this answer was referring to such liberty of interpretation, but I could be wrong. – Marc van Leeuwen Oct 15 '17 at 14:04
  • Also, recall that at the origin of OP's interrogation lies the question whether `std::deque::push_front` can validly (on some occasions) reallocate (and hence copy) a whole array of pointers, whose size is proportional to that of the `deque`, and still satisfy the "always constant time" requirement. Justifying that on the basis of the fact that making this copy is not doing any operation on contained objects is quite similar to claiming that my list indexing is O(1). – Marc van Leeuwen Oct 15 '17 at 14:13
3

As far as I understand, constant complexity means that the operation is O(1): you can tell in advance how much elementary operations (reads/writes, assembly instructions, whatever) the operation is going to take. And this bound is a common bound for all possible states of the target object. There is a catch here: in a multithreaded environment you cannot predict thread switches, so you can make some reasoning on the elapsed time of operation only in a real-time OS.

About the amortized constant complexity, this is even weaker. Making summary of the answers from here, it's a guarantee that in average your operation is a constant. This means, that the number of elementary operations for N consequent operations is O(N). This means that the number of elementary operations is in about O(1), but allows for some rare jumps. For example, adding an item to the vector's tail is usually constant, but sometimes an additional heavy operation is needed; having amortized constant time here means that the additional operation happens not so often and takes a predictable amount of time, so that the total time of N operation is still O(N). Of course, the same catch applies here as well.

So, answering to your questions:

  1. Complexity guarantees of standard apply really only to the number of machine code instructions needed to execute the operation, and don't imply that the runtime is somehow bounded. (Indeed, until recently C++ didn't even have a language-specified notion of thread, so from the C++ standard point of view by that time, the program was executed on a dedicated C++-machine.)
  2. Amortized is "bounded by a constant in average", which usually happens in the case of almost always constant-bounded operation time, with some quite rare deviations.

Edit:
You can look up at the e.g. section 23.1 of C++ standard:

All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.

Community
  • 1
  • 1
Vlad
  • 35,022
  • 6
  • 77
  • 199
  • I've already covered much of that in the question. I'm curious what *exactly* should be constant. Some say it's running time or ticks. But others have said elsewhere that that is irrelevant and that it simply means that the number of copy/move operations is a linear function of the number of items added (or something like that which is totally independent of time). – Aaron McDaid Dec 25 '11 at 21:16
  • 1
    @Aaron: Yes, that is what I am saying as well. You can think that the bounded thing is the number of target machine's instructions, executing your algorithm (not the ones in other threads!). – Vlad Dec 25 '11 at 21:18
  • @Aaron: just added the citation from the standard. – Vlad Dec 25 '11 at 21:31
  • Does that mean that the run time shall be a linear function of the number of operations? Or does it mean that the number of operations shall be a linear function of the number of items added or push_backed? – Aaron McDaid Dec 25 '11 at 21:37
  • 2
    @Aaron: No, of course. There is _absolutely no_ guarantee on runtime! The OS can switch to some other thread for quite a long amount of time. – Vlad Dec 25 '11 at 21:42
  • That citation is interesting. It implies that we are free to implement our algorithm as badly as we like, including one where the time isn't *really* constant in any pragmatic sense, but where we have respected the number of 'operations' on the contained objects. – Aaron McDaid Dec 25 '11 at 21:52
  • @Aaron: my guess is that C++ is implemented on such a number of platforms and hardware, that any stricter requirements would make at least some implementations complicated or even impossible. – Vlad Dec 25 '11 at 21:59
  • 2
    Yes, you are free to do stupid things like putting `sleep(rand())` inside `vector::insert`, or making a vector consume 5000 times as much memory as a same-sized array, and the marketplace is free to declare that your implementation sucks and should be avoided. The formal term used in the standard for this is *quality of implementation*. The language committee decided that their time would be better-spent on improving other parts of the standard than trying to prevent people from creating pathologically bad implementations that nobody would want to use. – Raymond Chen Dec 26 '11 at 14:18
3

if an operation is said to have "constant complexity", first of all it usually refers to time complexity. I could refer to space complexity, but if so that will be explicitly specified normally.

Now, complexity of an operation refers to how much the time to perform the operation will increase when the number of items being dealt with in the operation increases. For a constant complexity operation, the function will take the same amount of time whether there are zero items being dealt with or ten million items.

So:

  1. swap() is constant complexity because it doesn't matter how many items are in the vector, the operation will take the same amount of time.

  2. a push back on the list. is constant complexity because even though it may take some variable amount of time to allocate the new item, that allocation time doesn't increase because the list has 10 million elements (at least not in an algorithmic sense - of course if free memory becomes more and more depleted, allocation might take more time, but from an algorithmic point of view there`s an infinite amount of memory).

  3. push_back() on a vector is said to be 'amortized' constant because in the normal case where a reallocation doesn't have to occur the amount of time the operation will take is unrelated to how many elements are in the vector already - the same amount of time is taken to add a new element to a zero length vector as to a 10 million length vector. However, if the vector does need to be reallocated, a copy of the existing elements needs to occur, and that is not a constant operation - it's a linear operation. But the vector is supposed ot be designed so that those reallocation happen infrequently enough that they can be amortized over the course of many push_back() operations.

,But doing a push_back on a vector is even more unpredictable. Most of the time, it will be very quick, but every now and then it will have to reallocate space for all the data and copy every element to a new location. So it's less predictable in terms of runtime than a single list::push_front, but it's still referred to as constant (amortized). On average, adding a lot of data to a vector will take a complexity that is independent of the amount added, and this is why it's called 'amortized constant' time. (Am I right?)

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • This seems to be the most important sentence in your answer *"but from an algorithmic point of view there`s an infinite amount of memory)."* . Are you saying that allocations, such as `malloc`, are O(1), or maybe O(s) where s is the size of the memory to allocate? – Aaron McDaid Dec 25 '11 at 21:28
  • ... you're probably not saying that `malloc` is *really* O(1) or O(s), but simply that the C++ standard allows us to pretend that it is. Correct? – Aaron McDaid Dec 25 '11 at 21:39
  • @AaronMcDaid: Complexity, O-notation, is relative to specific variables. If you're measuring an algorithm relative to the number of elements in a list, you're *not* measuring it relative to the complexity of any memory allocations it might make. – Nicol Bolas Dec 25 '11 at 22:08
  • @Aaron: to put it strictly, according to the standard the complexity requirements on containers are "stated solely in terms of the number of operations on the contained objects". I think that whether you consider an allocation to be an operation on a contained object probably doesn't matter because whatever copy (or whatever) is done to the allocated memory will be accounted for in the complexity. – Michael Burr Dec 25 '11 at 22:11
1

A complexity of O(1) - constant (time complexity let's say), implies that the time to complete the algorithm is unrelated to the problem size.

So, looking a value up in a hashed structure is O(1) because the time required to do it is independent of how many values it contains. However, the same is not true of a linked list because we must scan through values (the number of which changes as we increase the number of elements) to find our value.

In case 3, when it copies every element it is not and O(1) operation, but an O(N) operation (but most of the time it is O(1), so it is usually constant). Amortization takes this into account, noting that the algorithm usually completes in O(1) time and rarely hits this O(N) case.

Cameron S
  • 2,251
  • 16
  • 18
1

Constant complexity just means that the time an operation takes to run is independent of the size of the input. It tells you nothing about runtime cost. Amortised constant complexity means that if you perform a sequence of operations, the average time for each individual operation will be independent of the size of the problem instance.

Stuart Golodetz
  • 20,238
  • 4
  • 51
  • 80
  • I already covered those points in my question! I'm curious about what you mean by "It tells you nothing about runtime cost", when you previously said "the time an operation takes to run is independent of the size of the input." Does 'constant' in the C++ standard refer to time or not? – Aaron McDaid Dec 25 '11 at 21:13
  • What I mean is that complexity doesn't tell you about actual running times, it tells you about order of growth. – Stuart Golodetz Dec 25 '11 at 21:16
  • Something that takes a year regardless of the size of the input would still be constant time. You can't distinguish between different types of constant time, the time taken either depends on the input or it doesn't... – Stuart Golodetz Dec 25 '11 at 21:19
  • But you can distinguish, for example, between constant *time* and constant *space* and constant *numbers-of-contained-type-moves-and-copies*. If you see the argument on the other thread, you'll see that I'm not asking an obvious question! http://stackoverflow.com/questions/8627373/what-data-structure-exactly-are-deques-in-c/8628035#8628035 – Aaron McDaid Dec 25 '11 at 21:23
  • You can certainly distinguish between constant time and constant space. Generally speaking, though, constant time doesn't necessarily mean that every operation will take the same time - there are often outside influences such as how fragmented your memory is etc. What it means is precisely that the actual running time (whatsoever it may be) is unaffected by the input size. No more, no less. – Stuart Golodetz Dec 25 '11 at 21:29
0

One point not yet mentioned with regard to problem complexity is that there's generally an implicit assumption that the machine's word size will be large enough to make solution possible, but no larger. For example, if one is given an array of N numbers in the range 1..N, the machine's word size is assumed to be large enough to handle values in the range 1..N, but there is no guarantee that there will be any "spare" bits available. This is significant, because while the number of bits required to hold the array will of course be O(NlgN), there is a fundamental difference between algorithms which require O(NlgN) total space, and those which require precisely O(1) + N*lg(N) [rounded up] space.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    You're talking more about space complexity, which isn't directly relevant to the question asked. – Aaron McDaid Dec 25 '11 at 21:26
  • @Aaron: The same general principle applies to both space and time complexity, since if one were allowed words of arbitrary size, one could perform an arbitrary amount of work with them in a "constant" time. – supercat Dec 26 '11 at 16:24