10

Its known that Qt widgets use implicit sharing. So I am interested if stl containers std::vector, std::string use implicit sharing too.

If no, why? Since it is very useful.

And if the answer is yes, how we can ascertain in it? I need simple C++ stl program which shows that stl containers use implicit sharing. It doesn't do deep copy when is copied.

demonplus
  • 5,613
  • 12
  • 49
  • 68
Ashot
  • 10,807
  • 14
  • 66
  • 117
  • 2
    The more common term is copy-on-write, and no the standard containers are not permitted to do it. I'll let someone else dig up the reasons why. – GManNickG Feb 05 '13 at 20:13
  • 1
    I think refcounting implementations of `string` are forbidden by the C++11 standard, because they do not behave well in multi-threading contexts. Not sure about C++03. – Andy Prowl Feb 05 '13 at 20:13
  • possible duplicate of [GNU STL string: is copy-on-write involved here?](http://stackoverflow.com/questions/4067395/gnu-stl-string-is-copy-on-write-involved-here) – Barmar Feb 05 '13 at 20:17
  • Probably not a duplicate because that question does not cover vector. – Nathan Monteleone Feb 05 '13 at 20:18

4 Answers4

12

No. They cannot. When you try to modify the contents of the container, or even calling a mutable begin() on it, it would imply a potential copy-on-write and thus invalidate all references and iterators to the container. This would be a hard to debug situation, and it is prohibited.

Although std::string is technically not a container, it is still prohibited to do copy-on-write since C++11:

References, pointers, and iterators referring to the elements of a basic_string sequence may be invalidated by the following uses of that basic_string object:
...
— Calling non-const member functions, except operator[], at, front, back, begin, rbegin, end, and rend.

[string.require]

... Since it is very useful.

Heh, what for? Passing by reference almost always solves all 'performance problems'. Atomic ref-counts are inherently non-scalable on multi-processors machines.

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 2
    The *it is very useful* remark needs to put into the content of Qt - whose designers seem to have had an aversion to useful language features such as exceptions and RTTI. – marko Feb 05 '13 at 20:33
  • 2
    @Marko: Yes. Qt is still built around the same narrow minded OO concepts of early nineties, when it was created... But the fact that it is a popular GUI library today doesn't indicate that it is a good library, instead it just says that we do not have a better library yet, or that such libraries did not catch up (unfortunately this is how good stuff often dies). – Yakov Galka Feb 05 '13 at 20:39
  • 2
    +1. Look no further than QtXML for a constructors that don't always fully construct objects and where the prevailing idiom is to check an error flag every time. – marko Feb 05 '13 at 20:46
  • -1 Copy-on-write doesn't necessarily invalidate iterators, so the basic premise of your answer is false. It would do so in a naive implementation that I imagine would belong in "textbooks". It's certainly possible to have a compliant implementation of standard containers implement with COW. Qt's implementation of COW containers certainly doesn't invalidate iterators on COW. Sure, some iterators may need to be a bit larger than STL iterators, but that's a design tradeoff. You are free to choose a container that best suits your purpose. – Kuba hasn't forgotten Monica Aug 01 '14 at 13:59
  • @KubaOber: iterators -- yes, you could do that at a cost of performance (additional dereference). But you cannot do that for references and pointers, they are going to be invalidated anyway, even in Qt. – Yakov Galka Aug 01 '14 at 16:28
  • 2
    @ybungalobill You're correct. I also *ass* umed, stupidly enough, that Qt's implementation didn't suffer from the iterator problem. It does. Oh boy - to work around the problem, one has to explicitly `detach()` the copy if there are active iterators on the source. Sigh. I'll have to audit [the code in all of my answers](https://github.com/KubaO/stackoverflown) to ensure I don't unwittingly expose anyone to the problem. – Kuba hasn't forgotten Monica Aug 01 '14 at 16:37
  • Now of course there are further implementation details that can be pursued to fix it. For example by doing a deep copy on objects with active iterators - this still doesn't fix the reference or pointer problems. The truth is, I'm afraid, that COW implementations want low-overhead iterators and they simply don't handle things properly, and it's too big of a mess to deal with in real code. I wonder if Qt might suffer from this very issue itself, since it uses its own containers exclusively. I guess that's at least one point for using iterators and not pointers/references to contained data. – Kuba hasn't forgotten Monica Aug 01 '14 at 16:40
  • @KubaOber, given http://ideone.com/dk8xKb how would you implement COW so that `p1` and `p2` both remain valid after the `push_back`? The standard guarantees that no reallocation occurs in the `push_back` call and `p1` remains valid, so the only way to do COW is to mark the vector as shared during the first call to `data()` (i.e. to say it has active iterators), so that `v2` actually makes a copy right away, and there is never any sharing. All the added code (and atomic operations) that would be needed for a COW `std::vector` would only help in rare cases where noone accesses the vector. – Jonathan Wakely Aug 01 '14 at 16:54
  • @JonathanWakely Is I've admitted, none of the iterator tricks "fix the reference or pointer problems". COW cannot maintain standard container semantics, period. My comment beginning with "-1" was pretending that pointers/references weren't used to access container's element, an obviously stupid omission. – Kuba hasn't forgotten Monica Aug 01 '14 at 17:04
  • @JonathanWakely I still worry that there may be subtle bugs in Qt's code where pointers or references are taken to contained elements. I'll try and see what sort of static analysis would pick up low hanging fruit there. It's the kind of a bug all too easy to sneak into the codebase, I'm afraid. Now I see that COW coupled with an STL-compatible-in-name-only interface is a disaster waiting to happen. – Kuba hasn't forgotten Monica Aug 01 '14 at 17:06
  • @KubaOber what is this detach() method? I don't see it, at least not in the Qt 5.4 documentation for QList. As far as I understand, in the latest Qt at least, an iterator always points to the copy it was created on, and if the collection is subsequently modified it will point to a new copy but the iterator will still point to the original copy. What's not clear to me in the docs is what happens if you do `*i = 5` on some iterator; does it point to the new copy then? – Andy Jun 16 '15 at 20:44
  • @Andy The iterator points to the shared data section in use when it was created. An `*i = 5` that itself forced a deep copy becomes an invalid iterator. In practice, it might point to the previously shared data, but that data might disappear at any time. And that does in fact suck :( But if some other container instance forced a deep copy, your instance is not affected. That at least limits the scope of necessary code auditing: any modification invalidates CoW iters. The `detach()` forces a copy if the ref count is greater than 1. It's a method available on pretty much all CoW Qt containers. – Kuba hasn't forgotten Monica Jun 16 '15 at 21:15
  • -1: STL not using implicit sharing has nothing to do with invalidating references or with modifying objects. This can all be dealt with. The real reason is performance in multithreaded programs: Implicit sharing requires locking and unlocking the house-keeping data all the time and this is way more expensive than copying strings at least. – Johannes Overmann Jul 04 '18 at 15:17
  • @JohannesOvermann: This is simply not true. It requires only a support for atomic counters, nothing more. And these are extremely cheap on memory coherent architectures, like x86. – Yakov Galka Jul 12 '18 at 01:09
  • @ybungalobill: That is true. But even atomic counters impose some overhead in the memory subsystem. It is actually quite some effort to keep the memory coherent behind the scenes on multiprocessor systems. This involves purging cache lines from other cores L1 caches etc. Keep in mind that 99.999% of all string values are never shared between threads, so almost all the atomic counting is done in vain. Value semantics (deep-copy) is much easier to reason about. – Johannes Overmann Jul 18 '18 at 08:06
5

Aside from the objections raised by others to CoW behaviour in containers, here are a few more. These all fall into the category of behaviour that defies convention, and will therefore cause bizarre bugs from unsuspecting developers.

Exceptions

Allowing CoW would means that innocuous mutation operations on a container can fail with exceptions when they wouldn't otherwise. This would be a particular hazard with operator[] on either a std::vector or std::string

Threading

One might reasonable expect to be able to copy construct a container with the express purpose of handing it off to another thread without worrying about concurrency thereafter. Not so with CoW.

marko
  • 9,029
  • 4
  • 30
  • 46
  • You're right about the exceptions, but wrong about threading. Properly implemented CoW, as in Qt, is perfectly thread-safe, as long as an instance of a container is accessed from a single thread only. The fact that internally the instance might use a shared representation is immaterial. It will be safely copied when needed, preserving the iterators. – Kuba hasn't forgotten Monica Aug 01 '14 at 14:04
3

As it's noticed in similar question:

The C++ standard doesn't prohibit or mandate copy-on-write or any other implementation details for std::string. So long as the semantics and complexity requirements are met an implementation may choose whatever implementation strategy it likes.

I think, same is true for std::vector

Also, you may be interested in this topic: How is std::string implemented

Community
  • 1
  • 1
Artem Sobolev
  • 5,891
  • 1
  • 22
  • 40
  • 4
    I think in C++11 copy-on-write is prohibited. – Pubby Feb 05 '13 at 20:19
  • 2
    @KubaOber, [string.require]/5, `s[0] = 'a'` is not allowed to invalidate pointers, references or iterators into `s`, which means it can't make a copy before returning the reference to the first `char`. Also, compare that paragraph to the C++03 standard, which had different rules mentioning the first call to non-const `operator[]`, non-const `at()` etc. (which was specifically there to allow COW implementations to make a deep copy when a non-const reference "leaks" out) – Jonathan Wakely Aug 01 '14 at 17:05
2

STL containers do not use implicit sharing. They always have plain value semantics.

The reason is runtime performance: In multithreaded programs (potentially but not necessarily running on multicore hosts) the locking overhead of the housekeeping data (e.g. reference counting, locking while copying before writing) by far outweighs the overhead of a plain value copy, which has no special threading implications at all. It is expected that programs which would suffer from copying around huge std::maps implement explicit sharing to avoid the copying.

In fact in the very early days of STL std::string did use implicit sharing. But it was dropped when the first multicore CPUs came up.

Johannes Overmann
  • 4,914
  • 22
  • 38
  • Are you claiming that STL pre-dates multicore processing?? – Toby Speight Jul 04 '18 at 15:35
  • Yes! Well. Somewhat. Blurry. That is. :-) The early STL predates when multicore processing was mainstream, e.g. became cheap and a standard (as it is today). Back then also multithreaded programming was a bit exotic. (It is still rocket science today unfortunately.) – Johannes Overmann Jul 04 '18 at 16:43