21

This is mostly a language lawyer kind of question, I doubt that most implementations would bother, especially since it would probably increase compile time for every user.

That being said: If some implementation of std::set was implemented using bitset for each instance and static array of 256 values that is shared(it is safe since keys are const) would that be legal according to the (if edition matters then assume C++20) standard?

NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • 1
    *it is safe since keys are const* C++17 kind of breaks that. [`insert`](https://en.cppreference.com/w/cpp/container/set/insert) and [`extract`](https://en.cppreference.com/w/cpp/container/set/extract) allow you to remove, change, and insert a key back into the associative containers. The "specialization" would have to deal with that complexity guaranteed by the standard. – NathanOliver May 04 '19 at 13:12
  • 1
    @NathanOliver I think the `node_type` could just contain a mutable `value_type` member directly. It doesn't ever need to represent a value which is also in some set, other node, etc. `insert(node)` would just need to do the same thing as `insert(node.value())`, which is already O(1). – aschepler May 04 '19 at 13:18
  • Note that you can only do this as the implementation, because otherwise (in theory) the implementation may already have the specialization. – L. F. May 04 '19 at 13:30
  • 2
    @aschepler One can obtain a reference to a stored object, then extract the node, mutate it, insert it back (to the same or different set) , and the reference still must be valid. So no it won't work. – n. m. could be an AI May 17 '19 at 12:00
  • Why would this (were it possible) increase compile time? These specializations would avoid template instantiations, and could even avoid code generation (but probably wouldn’t for inlining reasons). – Davis Herring Feb 23 '21 at 08:10
  • @DavisHerring my dumb guess was that if you have template specialization that it increases the cost of using base template since compiler needs to check what template you want... but what you say also makes sense... – NoSenseEtAl Feb 23 '21 at 16:54

3 Answers3

2

I see no constraint that would forbid you to make a specialized implementation, as long as you respect the standard specifications in section [set].

For set<char> or set<uint8_t> you'd need 32 octets to store the 256 bits representing the potential members, with the advantage of very fast set operations. For set<int> you'd consume too much memory and this would only be justified IMHO if you'd have very populated sets.

This being said, there are some chalenges to overcome:

  • you'd need to organize the array that maps values to bit positions, so that it is consistent with the provided comparator (cost at construction, unless you can share it)
  • you'll have to implement an iterator (but that's not really an issue, since the bitmap and a bit offset can do).
  • since C++17 there is an exposed assumption that the data structure uses nodes since there is an extract() member that is supposed to return a value of the (unspecified) specialized type node_type. Not sure what this requirement implies, but I think it could be solved in a similar manner than the iterator issue above.
  • You'd need to comply with the complexity requirements (see NathanOlivier's comment to your question). The difficulty comes from the ordering of the shared array. However, if you'd use two shared arrays (one to convert value into bit-offset, and one to convert bit-offset into value), or one array of pairs, you could insert anything in O(1).
Christophe
  • 68,716
  • 7
  • 72
  • 138
  • 1
    Why would `set` perform differently than `set` (assuming `CHAR_BIT==8`)? – aschepler May 04 '19 at 13:11
  • I think implementing the iterator requirements of `swap` could also be tricky. – JVApen May 04 '19 at 13:14
  • @JVApen not sure: [`set::swap`](https://en.cppreference.com/w/cpp/container/set/swap) swaps two sets with the same underlying type, so the same specialization. Just exchange the bitmaps and the pointer to the shared array. – Christophe May 04 '19 at 13:23
  • Ah, yes, that's possible. I was thinking it was a regular member. Must have read the question badly. – JVApen May 04 '19 at 13:24
  • 1
    Without checking, I think that the iterator invalidation rules for `set` would make this hard, if not impossible. – Marshall Clow May 04 '19 at 13:25
  • @MarshallClow with the proposed idea in the second bullet, the iterator would stay valid after the swap (since the bit offset would not change). – Christophe May 04 '19 at 13:33
  • 1
    @Christophe I'm more concerned about insertion and deletion. – Marshall Clow May 04 '19 at 14:57
  • @MarshallClow If you could always keep the iterator valid, as it refers to static data, do you than still have problems with invalidation? (Actually validation, as an interator can stay valid after invalidation) – JVApen May 05 '19 at 16:36
  • 1
    consider `std::set s = {0,2,4,6,8}; auto iter = s.find(4); s.insert(3);` What is `*iter` now? – Marshall Clow May 06 '19 at 14:31
  • @MarshallClow according to my second bullet it would still be valid: the raw set data would be in binary 101010101 followed by 247 times 0, and a pointer to a shared table containing the numbers 0 to 255, the number i is or not in the set depending on bit i. The iterator would be represented by something like an pointer to the raw set data and a bit offset. In the case of find(4) this offset would be 4. insert(3) would set a bit (raw data now 1011101010...0). The address to the raw data has not changed and neither the order of the bits: the iterator remains valid as required by c++17 :-) – Christophe May 06 '19 at 17:09
  • @Christophe I think I have [an edge case](https://stackoverflow.com/a/66421177/6394138) that can serve as a constraint for making a specialized implementation – Leon Mar 01 '21 at 11:22
2

EDIT: Made a mistake. Proxy iterators are not allowed in C++17.

I think it is not possible. First: That implementation will hold all complexity guarantees and will for most of them be better than the requirements.

Second: You are not allowed to create proxy-iterator when using standard container since they had to return real references at some points. (The std::bitset mentioned by @Christophe is not a container and has a proxy-reference in its definition. The std::vector < bool > is a famous example for breaking guarantees.). So it is not possible to use that implementation.

Edit: Tanks to @NicolBolas for pointing that proxy iterators are still not allowed.

user6556709
  • 1,272
  • 8
  • 13
  • @NicolBolas Sorry, my fault. That was only proposed for C++17 and drop. – user6556709 May 04 '19 at 13:53
  • As the returned value of std::set::iterator is always const, I think that if one has a constant vector/array with all possible values and return it's value by const-ref, that doesn't use proxy objects – JVApen May 05 '19 at 16:28
  • Does the standard care if you can implement is better than what's indicated as time complexity? You shouldn't do worse. – JVApen May 05 '19 at 16:30
  • @JVApen He wants store an bitmap. Which means 1 bit for each value and not a array/vector with all values. The other thing was bad formulated. It is Ok to be better. I should have clarified it after editing my original answer. – user6556709 May 05 '19 at 17:56
  • @JVApen Now, I understand what you want to do. I'm not sure if your operator++ of the iterator with a complexity of O(n) is legal. – user6556709 May 05 '19 at 18:20
  • begin() should be constant in time, although, possible if you keep extra state. (or if one could at constant time find the first one via some CPU instruction) – JVApen May 05 '19 at 18:40
  • @JVApen I don't think there is a cpu which returns you the next bit in a byte which is set. But you can trick and simply always check against all bits which would have a constand runtime. The other thing is I think you can argue that even a linear operation has a constant complexity cause it only means that you are to be able to find an absolute upper bound for the needed operations. In short: I think you are right. – user6556709 May 05 '19 at 18:51
0

Cristophe's answer contains an important point:

  • you'd need to organize the array that maps values to bit positions, so that it is consistent with the provided comparator (cost at construction, unless you can share it)

This points to a challenging problem of dealing with stateful comparators (when the ordering relation is not a fixed one, but can be controlled from the outside and dynamically changed at run-time). Although frowned upon by the general C++ folks, I couldn't find anything in the standard that prohibits stateful comparators for associative containers (but I didn't look too hard). If you manage the state of the comparator and the content of the set so that the ordering requirement is never violated, then it works in practice (at least, it has worked for me).

So my verdict is that in the most general case of an arbitrary comparator std::set cannot be specialized for small integral types using a bitset and any supporting shared static data.

Leon
  • 31,443
  • 4
  • 72
  • 97