29

For representing a length or count variable, is it better to use signed or unsigned integers?

It seems to me that C++ STL tends to prefer unsigned (std::size_t, like in std::vector::size(), instead C# BCL tends to prefer signed integers (like in ICollection.Count.

Considering that a length or a count are non-negative integers, my intuition would choose unsigned; but I fail to understand why the .NET designers chose signed integers.

What is the best approach? What are the pros and cons of each one?

palik
  • 2,425
  • 23
  • 31
  • 1
    you have to check this out : http://stackoverflow.com/questions/3935165/why-does-net-framework-not-use-unsigned-data-types – IndieProgrammer Apr 06 '12 at 08:18
  • 10
    I suspect C# uses signed integers because [unsigned integers are not CLS-compliant](http://stackoverflow.com/questions/6325/why-are-unsigned-ints-not-cls-compliant). – Joe Apr 06 '12 at 08:24
  • 3
    @Joe You suspect correctly. It's mostly because Microsoft wanted to propagate *"inter-language cooperation"*, the key architecture point of their .NET initiative, the common language runtime. Introducing multiple flavors of integers simply wasn't acceptable because of compatibility issues between languages, so it was declared as not CLS-compliant. It's there, you get pretty little warnings if you decide to use them and want to be CLS-compliant, a *valued member of society*, so to speak. :Đ –  Apr 06 '12 at 09:07

4 Answers4

29

C++ uses unsigned values because they need the full range. On a 32-bit system, the language should make it possible to have a 4 GB vector, not just a 2 GB one. (the OS might not allow you to use all 4 GB, but the language itself doesn't want to get in your way)

In .NET, unsigned integers aren't CLS-compliant. You can use them (in some .NET languages), but it limits portability and compatibility. So for the base class library, they only use signed integers.

However, these are both edge cases. For most purposes, a signed int is big enough. So as long as both offer the range you need, you can use both.

One advantage that signed integers sometimes have is that they make it easier to detect underflow. Suppose you're computing an array index, and because of some bad input, or perhaps a logic error in your program, you end up trying to access index -1.

With a signed integer, that is easy to detect. With unsigned, it would wrap around and become UINT_MAX. That makes it much harder to detect the error, because you expected a positive number, and you got a positive number.

So really, it depends. C++ uses unsigned because it needs the range. .NET uses signed because it needs to work with languages which don't have unsigned.

In most cases, both will work, and sometimes, signed may enable your code to detect errors more robustly.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • You meant `UINT_MAX`, of course. You can compare your unsigned index with the size of the array or `INT_MAX` to spot overflows. Greater or equal? Alarm! – Alexey Frunze Apr 06 '12 at 09:05
  • @Alex yah, fixed that. and sure, but the size of the array might not be known. Or it might not be a simple indexing operation, but some other function where *any* positive number of potentially valid, but a negative one is not. Point is, there are sometimes advantages to both – jalf Apr 06 '12 at 09:22
  • @jalf: First, thanks for the excellent answer. Is my understanding correct - from what you wrote - that using `singed` integers is _safer_ than using `unsigned`? If so, maybe the BCL uses signed integers for this reason besides CLS constraints? –  Apr 06 '12 at 14:55
  • @Mr_C64: it depends on the context. I simply pointed out *one* scenario where it might be safer. Alex showed another where it makes no difference (and where error-checking actually becomes simpler with an unsigned int). You'll have to think about this yourself, for the specific situation where *you* need it. :) – jalf Apr 06 '12 at 16:01
  • 3
    True, however using unsigned to cover the full possible range really made sense in 16-bit systems, not so much on 32 bit. As soon as you come close to billions of elements, you might as well use 64 bit types. Having unsigned in everyday's code mostly attract bugs, warnings, mess and pain. – Jem Dec 06 '12 at 15:17
  • The range of unsigned values is a minor issue compared to the fact that almost all arithmetic operations on unsigned numbers yield defined results, with the sole exceptions of division by zero, oversized shifts, and multiplication of unsigned numbers whose type is smaller than `int`, but whose product won't fit in `int` [e.g. on a machine with 64-bit `int`, (uint32_t)3037000500 * (uint32_t)3037000500]. – supercat Mar 03 '14 at 23:33
  • IMO it was a failure to use unsigned ints for container sizes. Should have been plain int. I'm sure so many people had soo many bugs because they started to use size_t for in code to avoid compiler warnings. – Pavel P Oct 06 '19 at 05:04
  • But std::vector::max_size returns the maximum signed int. I guess you can't have 4 GB vectors with a 32-bit machine because you wouldn't be able to represent the difference between the first and last elements as a signed value, which doesn't sound good. I think the correct answer is that the STL uses unsigned just because sizes can't be negative which might have been a mistake considering the way unsigned works in C++. – Alkis Nov 23 '22 at 17:19
  • @Alkis your specific environment may be that, but it's not necessarily the case for every environment. for example https://stackoverflow.com/q/3813124/5980430, – apple apple Apr 18 '23 at 19:31
  • @appleapple nah, these posts are just old, both GCC and Clang have fixed this bug since then. Here's the relevant commit from libc++: https://github.com/llvm/llvm-project/commit/55b31b4e6934fb2e9dda6d7f0f2792b6c3420c05 Also, it's clearly stated in [here](https://en.cppreference.com/w/cpp/types/ptrdiff_t) that subtracting pointers that are too far is UB but that happens all the time when you try to use a vector with more than 2^9 elements on a 32-bit machine. – Alkis Apr 24 '23 at 14:23
  • @Alkis there is no restriction on `std::ptrdiff_t` to be 32 bit. – apple apple Apr 25 '23 at 13:20
  • @Alkis and [at the same page you link](https://en.cppreference.com/w/cpp/types/ptrdiff_t), it also state "Programs that use other types, such as `int`, may fail ... when the index exceeds `INT_MAX`" – apple apple Apr 25 '23 at 13:32
  • @Alkis also, there is no restriction on `difference_type` and `size_type`, the commit you link only says it respect the `T::difference_type`. – apple apple Apr 26 '23 at 15:19
4

It's natural to use unsigned types for counts and sizes unless we're in some context where they can be negative and yet be meaningful. My guess is that C++ follows this same logic of its elder brother C, in which strlen() returns size_t and malloc() takes size_t.

The problem in C++ (and C) with signed and unsigned integers is that you must know how they are converted to one another when you're using a mixture of the two kinds. Some advocate using signed ints for everything integer to avoid this issue of programmers' ignorance and inattention. But I think programmers must know how to use their tools of trade (programming languages, compilers, etc). Sooner or later they'll be bit by the conversion, if not in what they have written, then in what someone else has. It's unavoidable.

So, know your tools, choose what makes sense in your situation.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
2

There's a few aspects here:

1) Max Values: typically the maximum value of an signed number is 1/2 that of the corresponding unsigned max value. For example in C, the max signed short value is 32767 whereas the max unsigned short value is 65535 (because 1/2 of the range isn't needed for the -ve numbers). So if your expecting lengths or counts that are going to be large an unsigned representation makes more sense.

2) Security: You can browse the net for integer overflow errors, but imagine code such as:

if (length <= 100)
{
  // do something with file
}

... then if 'length' is an signed value, you run the risk of 'length' being a -ve number (though malicious intent, some cast, etc) and the code not performing a you expected. I've seen this on a previous project where a sequence was incremented for each transaction, but when the signed integer we used got to max signed int value (2147483647) it suddenly became -ve after the next increment and our code couldn't handle it.

Just some things to think about, regardless of the underlying language/API considerations.

Gary Robinson
  • 331
  • 2
  • 6
  • 2
    Another problem may be code like `while (--size >= 0) ... ` When `size` is unsigned, the condition is always true. –  Apr 06 '12 at 15:53
  • 1
    On the other hand, `while(size-- > 0)` is a reliable idiom (though mostly in C/C++, not so much C# since so much emphasis is put on using signed types everywhere that using unsigned types is more trouble than it is worth as you need to cast essentially all the time). Signed types won't salvage bad code, it will just hide logic errors :) – Thomas Nov 14 '14 at 00:47
  • "corresponding" --> Although `size_t` and `ssize_t` have similar names, their POSIX definitions are not corresponding. `ssize_t` may be the same bit width as `size_t`, it may be wider. [ssize_t Used for a count of bytes or an error indication](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html) says nothing about `size_t`. – chux - Reinstate Monica Jan 03 '20 at 22:13
0

If you aren't designing a reusable library (in .NET terms, e.g. a VB.NET project consumes your C# class library) then pick what works for you. Of course if you are creating any kind of DLL, and it's feasible your library could be used in a project with a different language (again, VB.NET comes to mind) then you need to be mindful of the non-compliant types (unsigned).

Chris
  • 154
  • 1
  • 9