120

C++20 introduced the std::ssize() free function as below:

template <class C>
    constexpr auto ssize(const C& c)
        -> std::common_type_t<std::ptrdiff_t,
                              std::make_signed_t<decltype(c.size())>>;

A possible implementation seems using static_cast, to convert the return value of the size() member function of class C into its signed counterpart.

Since the size() member function of C always returns non-negative values, why would anyone want to store them in signed variables? In case one really wants to, it is a matter of simple static_cast.

Why is std::ssize() introduced in C++20?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
John Z. Li
  • 1,893
  • 2
  • 12
  • 19
  • Notice that the `static_cast` might be UB before C++20 (with "overflow"). – Jarod42 May 20 '19 at 08:49
  • In case you use the result to do some pointer arithmetic. – mcabreb May 20 '19 at 08:54
  • 4
    @Jarod42 Isn't it implementation defined instead of undefined? (signed overflow is undefined. but signed conversion is implementation defined) – phön May 20 '19 at 09:37
  • 1
    @phön: Indeed, (but I mostly treat both of them as wrong. We cannot rely on not portable value. – Jarod42 May 20 '19 at 10:09
  • 13
    If only they add `ssizeof` operator as well. – geza May 20 '19 at 10:19
  • 3
    This might be somewhat related: https://stackoverflow.com/questions/30395205/why-are-unsigned-integers-error-prone – Marco13 May 20 '19 at 13:37
  • 3
    @Marco13 The implicit conversion between signed number and unsigned number is broken. – John Z. Li May 20 '19 at 14:03
  • 13
    @JohnZ.Li At the risk of sounding too unconsructive: I think that *the whole type system of C++ regarding the integer types* is broken. Sure, one can argue that some quirks (like not knowing how many bits a `char` has) are inherited from C and at least somewhat alleviated by `(u)intX_t`, but it's still an endless source of equally subtle *and* critical bugs. Things like `ssize` are only patches, and it will take a while (maybe "forever") until this sinks into the common "best practices guides" that people (can) follow rigorously. – Marco13 May 20 '19 at 14:47
  • 8
    @Marco13: On the other hand, the C/C++ type system (as opposed to e.g. Java's fixed types system), aside from allowing C/C++ code to work on architectures where most other languages croak, *does* allow *competent* instructors to get some important lessons into a student's head. Like, not all the world is 64bit. And no, not all the world uses 8-bit chars. It is *dead easy* to cope with these things, *and* it makes you a better developer, if only instructors would teach this *from the beginning*. (And, just to make sure, you *do* know that the `(u)intX_t` types are *optional*, do you?) – DevSolar May 21 '19 at 11:13
  • 7
    @DevSolar I hestiate to walk too far along the "language bashing" road here, but frankly: Having roughly 70 different integer types, some of them optional, and not knowing the *size* of most of them can not be justified with "teaching students a lesson". It **is** an excess on the level of the *definition and specification of the language* (and I wouldn't take too much pride in knowing all the quirks here). Again, I'm aware of some of the historical reasons, but there's no point in sugarcoating that: It's legacy, makes the life of application developers difficult, and causes bugs. – Marco13 May 21 '19 at 12:11
  • 1
    @Marco13: But that's *exactly* the kind of language bashing that doesn't see the difference between languages that enjoy being specified in terms of a virtual machine and those specified to allow code that's both natively-optimized *and* portable. There are plenty of situations where you don't *care* about the exact width and are willing to leave that to the machine. There are situations where you need an *exact* width. That there *are* languages handwaving that away makes neither language "bad" or "legacy", merely aimed at different things. C++ isn't RAppD, but it's many things others aren't. – DevSolar May 21 '19 at 12:17
  • 1
    @DevSolar We can talk a bit further, but if so, should do this in chat. Until then, I think I see your point, but disagree with the priorities, or rather "goals of a programming language": The fact that you can use C/C++ for programming a 13-bit-microcontroller, a distributed server application or a desktop application may be considered as a *reason*, but not as a *justification* for some quirks: You simply do not write the same code in these cases - even though the code is based on the same (1300 page) spec and passed through the same compiler. – Marco13 May 21 '19 at 12:28
  • 2
    @Marco13: But you come here to SO to ask about any problems you might have, and any experienced C++ coder can help you with the code even if he/she hasn't even heard of the platform. ;-) I see your point as well, and even I refer to C++ as "the beast" when I teach it. I'd just prefer if we could refrain from "comparing" languages this way altogether. Java is good for what it does, as is C++. They just aren't for the same thing. ;-) – DevSolar May 21 '19 at 12:36
  • 2
    At a very deep level, the system of integral types in C/C++ is broken because it somehow considers unsigned types as limited range integers and then assigns them modulo behavior. There zero common purpose between these two concepts: a limited range (positive) integer would see benefits from UB of overflow (underflow) allowing runtime checking. A modulo type is rarely useful, but it's extremely useful to have one when you need it. A size is a positive quantity not a modulo quantity. – curiousguy Dec 20 '19 at 04:48

2 Answers2

88

The rationale is described in this paper. A quote:

When span was adopted into C++17, it used a signed integer both as an index and a size. Partly this was to allow for the use of "-1" as a sentinel value to indicate a type whose size was not known at compile time. But having an STL container whose size() function returned a signed value was problematic, so P1089 was introduced to "fix" the problem. It received majority support, but not the 2-to-1 margin needed for consensus.

This paper, P1227, was a proposal to add non-member std::ssize and member ssize() functions. The inclusion of these would make certain code much more straightforward and allow for the avoidance of unwanted unsigned-ness in size computations. The idea was that the resistance to P1089 would decrease if ssize() were made available for all containers, both through std::ssize() and as member functions.

Community
  • 1
  • 1
Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
  • 36
    The `for(int i = 0; i < container.ssize() - 1; ++i)` example is also fairly compelling – Caleth May 20 '19 at 08:53
  • Why span uses a signed integer as an index and a size. Is there a reason that signed values must be used? – John Z. Li May 20 '19 at 09:00
  • 9
    @John it seems to me indeed that they could do the same thing as string::npos and just use size_t(-1) as a special value. – rubenvb May 20 '19 at 09:02
  • 17
    @JohnZ.Li It is long considered a mistake that STL size types are unsigned. Now unfortunately it's too late to reform it. Providing a free function is the best we can do as of now. – L. F. May 20 '19 at 09:53
  • 2
    @L.F. I don't follow. Why is it considered a mistake that STL use nonnegative size types? – John Z. Li May 20 '19 at 10:03
  • 1
    @JohnZ.Li I remember Bjarne Stroustrup said that at some point. I can't recall by now, though ... – L. F. May 20 '19 at 10:08
  • 17
    @L.F.: It was Herb Sutter in a conference (maybe Bjarne said this as well). But, he is a little bit wrong. Now, with 32-bit/64-bit computers, signed size would be better (So he's right). But in the old days (16-bit sizes), signed size would have been bad (for example, we could have allocated 32k byte arrays only). – geza May 20 '19 at 10:24
  • 13
    @L.F.: I've found Herb's mentioning this: https://www.youtube.com/watch?v=Puio5dly9N8&t=2667. When he says that "does not come up in practice very much", it is true nowadays. But it wasn't true >20 years ago (16-bit systems) at all. So, it was not that much of a mistake to use unsigned, when the STL was designed. – geza May 20 '19 at 10:47
  • 3
    @Caleth Also `for (auto i = container.ssize() - 1; i >= 0; --i)`... also known as everyone's first introduction to unsigned wraparound bugs. – Barry May 20 '19 at 13:06
  • @geza Today for desktop systems, true. But embedded systems are still the overwhelming majority. – Deduplicator May 20 '19 at 15:07
  • @Deduplicator: I'm not too familiar with embedded systems. If they are 16-bit, and C++ used for development, then yes, 16-bit unsigned `size_t` is still relevant today, and unsignedness shouldn't be considered as a mistake. – geza May 20 '19 at 15:20
  • @L.F. It's wrong to use an unsigned type for `size_t` and for anything that's related to a quantity which by definition can't be negative. **Unsigned arithmetic makes sense for modulo, period.** – curiousguy Dec 20 '19 at 04:50
  • This is wrong. So all containers should have a signed size because you never know when I want to add a negative number to the container size ! Just wrong ... – Toughy May 10 '23 at 22:17
60

Gratuitously stolen from Eric Niebler:

'Unsigned types signal that a negative index/size is not sane' was the prevailing wisdom when the STL was first designed. But logically, a count of things need not be positive. I may want to keep a count in a signed integer to denote the number of elements either added to or removed from a collection. Then I would want to combine that with the size of the collection. If the size of the collection is unsigned, now I'm forced to mix signed and unsigned arithmetic, which is a bug farm. Compilers warn about this, but because the design of the STL pretty much forces programmers into this situation, the warning is so common that most people turn it off. That's a shame because this hides real bugs.

Use of unsigned ints in interfaces isn't the boon many people think it is. If by accident a user passes a slightly negative number to the API, it suddenly becomes a huge positive number. Had the API taken the number as signed, then it can detect the situation by asserting the number is greater than or equal to zero.

If we restrict our use of unsigned ints to bit twiddling (e.g., masks) and use signed ints everywhere else, bugs are less likely to occur, and easier to detect when they do occur.

sp2danny
  • 7,488
  • 3
  • 31
  • 53
  • 7
    Swift takes this approach, even though it doesn't have the concern about negative signed numbers being reinterpreted as massive unsigned numbers (since there are no implicit casts, which are what really get you into this crazy fun house to begin with). They just take the approach the (machine word sized) `Int` should be the common currency types of whole numbers, even where only positive numbers make sense (such as indexing an array). Any deviation from it should be well founded. It's nice to not need to worry about casts everywhere. – Alexander May 20 '19 at 14:38
  • 4
    @JohnZ.Li Indeed, ["unsigned int considered harmful for Java"](https://www.nayuki.io/page/unsigned-int-considered-harmful-for-java) – Nayuki May 21 '19 at 13:54
  • 3
    Rust (most modern system programming language) also uses unsigned types for indexes (because it makes sense). C++ is simply unrepairable. It is old technology, poorly designed with almost all default behaviour incorrect/unintuitive. All it happens to C++ in last couple of releases is poor patching poorly designed features. – There is nothing we can do Sep 22 '21 at 14:24
  • "If we restrict our use of unsigned ints to bit twiddling (e.g., masks) and use signed ints everywhere else, bugs are less likely to occur, and easier to detect when they do occur." This is the WAY :-) – jose.angel.jimenez Nov 13 '22 at 12:14