68

Recently, I noticed some people mentioning that std::list::size() has a linear complexity.
According to some sources, this is in fact implementation dependent as the standard doesn't say what the complexity has to be.
The comment in this blog entry says:

Actually, it depends on which STL you are using. Microsoft Visual Studio V6 implements size() as {return (_Size); } whereas gcc (at least in versions 3.3.2 and 4.1.0) do it as { return std::distance(begin(), end()); } The first has constant speed, the second has o(N) speed

  1. So my guess is that for the VC++ crowd size() has constant complexity as Dinkumware probably won't have changed that fact since VC6. Am I right there?
  2. What does it look like currently in gcc? If it is really O(n), why did the developers choose to do so?
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
foraidt
  • 5,519
  • 5
  • 52
  • 80

8 Answers8

85

In C++11 it is required that for any standard container the .size() operation must be complete in "constant" complexity (O(1)). (Table 96 — Container requirements). Previously in C++03 .size() should have constant complexity, but is not required (see Is std::string size() a O(1) operation?).

The change in standard is introduced by n2923: Specifying the complexity of size() (Revision 1).

However, the implementation of .size() in libstdc++ still uses an O(N) algorithm in gcc up to 4.8:

  /**  Returns the number of elements in the %list.  */
  size_type
  size() const _GLIBCXX_NOEXCEPT
  { return std::distance(begin(), end()); }

See also Why is std::list bigger on c++11? for detail why it is kept this way.

Update: std::list::size() is properly O(1) when using gcc 5.0 in C++11 mode (or above).


By the way, the .size() in libc++ is correctly O(1):

_LIBCPP_INLINE_VISIBILITY
size_type size() const _NOEXCEPT     {return base::__sz();}

...

__compressed_pair<size_type, __node_allocator> __size_alloc_;

_LIBCPP_INLINE_VISIBILITY
const size_type& __sz() const _NOEXCEPT
    {return __size_alloc_.first();}
Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
53

Pre-C++11 answer

You are correct that the standard does not state what the complexity of list::size() must be - however, it does recommend that it "should have constant complexity" (Note A in Table 65).

Here's an interesting article by Howard Hinnant that explains why some people think list::size() should have O(N) complexity (basically because they believe that O(1) list::size() makes list::splice() have O(N) complexity) and why an O(1) list::size() is be a good idea (in the author's opinion):

I think the main points in the paper are:

  • there are few situations where maintaining an internal count so list::size() can be O(1) causes the splice operation to become linear
  • there are probably many more situations where someone might be unaware of the negative effects that might happen because they call an O(N) size() (such as his one example where list::size() is called while holding a lock).
  • that instead of permitting size() be O(N), in the interest of 'least surprise', the standard should require any container that implements size() to implement it in an O(1) fashion. If a container cannot do this, it should not implement size() at all. In this case, the user of the container will be made aware that size() is unavailable, and if they still want or need to get the number of elements in the container they can still use container::distance( begin(), end()) to get that value - but they will be completely aware that it's an O(N) operation.

I think I tend to agree with most of his reasoning. However, I do not like his proposed addition to the splice() overloads. Having to pass in an n that must be equal to distance( first, last) to get correct behavior seems like a recipe for hard to diagnose bugs.

I'm not sure what should or could be done moving forward, as any change would have a significant impact on existing code. But as it stands, I think that existing code is already impacted - behavior might be rather significantly different from one implementation to another for something that should have been well-defined. Maybe onebyone's comment about having the size 'cached' and marked known/unknown might work well - you get amortized O(1) behavior - the only time you get O(N) behavior is when the list is modified by some splice() operations. The nice thing about this is that it can be done by implementors today without a change to the standard (unless I'm missing something).

As far as I know, C++0x is not changing anything in this area.

Evg
  • 25,259
  • 5
  • 41
  • 83
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • The answer is correct but the reasoning on the size of list is flowed. Your proposal is prone to inconsistent parameters and violate the principle of having the user to give every information only once. – PierreBdR Oct 23 '08 at 09:31
  • 6
    Should also be possible to keep splice O(1), but mark the size as "unknown". Then size() is still O(N) worst-case, but the worst case occurs at most once per 'unfriendly' splice. So performance of all operations is strictly superior to an always-O(N) size(). Warning: I have not thought this through. – Steve Jessop Oct 23 '08 at 12:09
  • "strictly superior" - actually that's a lie, since there are some extra checks in splice to figure out what case you're in, and arithmetic with sizes in all mutators. Told you I hadn't thought it through. But complexity is never worse, and sometimes better. – Steve Jessop Oct 23 '08 at 12:12
  • @PierreBdR - In case it's not clear, I'm not the author of the paper, I pointed to it because I thought it had some interesting points. I have edited the answer to make that more clear (as well as adding some more of my own thoughts and incorporating ideas from these comments). – Michael Burr Oct 23 '08 at 19:39
  • 15
    The latest C++0x draft requires that `size()` have constant time complexity (that change to the container requirements was made in N3000). – James McNellis Mar 16 '10 at 19:18
15

I've had to look into gcc 3.4's list::size before, so I can say this:

  1. It uses std::distance(head, tail).
  2. std::distance has two implementations: for types that satisfy RandomAccessIterator, it uses "tail-head", and for types that merely satisfy InputIterator, it uses an O(n) algorithm relying on "iterator++", counting until it hits the given tail.
  3. std::list does not satisfy RandomAccessIterator, so size is O(n).

As to the "why", I can only say that std::list is appropriate for problems that require sequential access. Storing the size as a class variable would introduce overhead on every insert, delete, etc., and that waste is a big no-no per the intent of the STL. If you really need a constant-time size(), use std::deque.

Evg
  • 25,259
  • 5
  • 41
  • 83
introp
  • 263
  • 1
  • 3
13

I personally don't see the issue with splice being O(N) as the only reason why size is permitted to be O(N). You don't pay for what you don't use is an important C++ motto. In this case, maintaining the list size requires an extra increment/decrement on every insert/erase whether you check the list's size or not. This is a small fixed overhead, but its still important to consider.

Checking the size of a list is rarely needed. Iterating from begin to end without caring the total size is infinitely more common.

Greg Rogers
  • 35,641
  • 17
  • 67
  • 94
5

I would go to the source (archive). SGI's STL page says that it is permitted to have a linear complexity. I believe that the design guideline they followed was to allow the list implementation to be as general as possible, and thus to allow more flexibility in using lists.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Yuval F
  • 20,565
  • 5
  • 44
  • 69
1

This bug report: [C++0x] std::list::size complexity, captures in excruciating detail the fact that the implementation in GCC 4.x is linear time and how the transition to constant time for C++11 was slow in coming (available in 5.0) due to ABI compatibility concerns.

The manpage for the GCC 4.9 series still includes the following disclaimer:

Support for C++11 is still experimental, and may change in incompatible ways in future releases.


The same bug report is referenced here: Should std::list::size have constant complexity in C++11?

Community
  • 1
  • 1
Brent Bradburn
  • 51,587
  • 17
  • 154
  • 173
0

If you are correctly using lists you aren't probably noticing any difference.

Lists are good with big data structures that you want to rearrange without copying, of for data you want to keep valid pointers after insertion.

In the first case it makes no difference, in the second i would prefer the old (smaller) size() implementation.

Anyway std is more about correctness and standard behavious and 'user friendlyness' than raw speed.

Luke Givens
  • 829
  • 1
  • 11
  • 15
0

All the above answers mentioned C++11 and GCC, but not mention _GLIBCXX_USE_CXX11_ABI compile definition in GCC, this is not enough

  1. Compiled with -std=c++11 do not mean std::list::size() is O(1) in GCC, it's default O(1), but when compiled with _GLIBCXX_USE_CXX11_ABI=0, it's O(N)
  2. GCC version < 5.1, like GCC4.8 support C++11, but std::list::size() is O(N) no matter you use what compile flags. .

This can be clearly be shown in current GCC source code: size() is implement as below

  _GLIBCXX_NODISCARD
  size_type
  size() const _GLIBCXX_NOEXCEPT
  { return _M_node_count(); }

It called _M_node_count(), and _M_node_count is implemented as below:

    #if _GLIBCXX_USE_CXX11_ABI
      static size_t
      _S_distance(const_iterator __first, const_iterator __last)
      { return std::distance(__first, __last); }

      // return the stored size
      size_t
      _M_node_count() const
      { return this->_M_get_size(); }
#else
      // dummy implementations used when the size is not stored
      static size_t
      _S_distance(const_iterator, const_iterator)
      { return 0; }

      // count the number of nodes
      size_t
      _M_node_count() const
      { return std::distance(begin(), end()); }
#endif

if _GLIBCXX_USE_CXX11_ABI is set to 0, the size() is O(N), in other case is O(1). _GLIBCXX_USE_CXX11_ABI set to 0 will happen when you use a high version GCC compiled libraries and need link to a low version GCC compiled libraries, for example, your system have installed a GCC 4.8 compiled GTEST libraries, but your code now use GCC 7.3 and use C++11, and you need to link to that libraries, in this case you need to add following to your CMakeLists.txt, otherwise it will have link issue.

add_compile_definitions(_GLIBCXX_USE_CXX11_ABI=0)

In a word, in GCC,

  • When using GCC version < 5.1, std::list::size() is O(N)
  • When using GCC version >= 5.1:
    • when compiled with _GLIBCXX_USE_CXX11_ABI=0, std::list::size() is O(N)
    • when compiled without _GLIBCXX_USE_CXX11_ABI set(default is 1) or with _GLIBCXX_USE_CXX11_ABI=1,
      std::list::size() is O(1)
BruceSun
  • 144
  • 9