For iterating though an array should we be using size_t or ptrdiff_t?

Question

In this blog entry by Andrey Karpov entitled, "About size_t and ptrdiff_t" he shows an example,

for (ptrdiff_t i = 0; i < n; i++)
  a[i] = 0;

However, I'm not sure if that's right, it seems that should be

for (size_t i = 0; i < n; i++)
  a[i] = 0;

Is this correct?

I know we should also likely be using something like memset, but let's avoid that entirely. I'm only asking about the type

`ptrdiff_t` is a signed type, so it may not be very safe if you access array with. — Paul Ankman, Jul 07 '18 at 21:43
Depends on what `n` is and what `a` is. If `a` is a built-in array, `size_t` is more appropriate. — DeiDei, Jul 07 '18 at 21:44
@PaulAnkman that also sounds like a good reason, also reading this answer by AnT it seems like in one possible C implementation `ptrdiff_t` may actually be larger than `size_t` which would seem to make this less efficient **and** dangerous. — Evan Carroll, Jul 07 '18 at 21:44
Not sure what `container::size_type` is, but this question is on C, not C++.(Guessing that's C++?) — Evan Carroll, Jul 07 '18 at 21:45
@EvanCarroll Sorry, missed that. The first part of my comment still stands though. — DeiDei, Jul 07 '18 at 21:46
The linked article is totally bogus, full of mistakes and misconceptions. Of course you're right. Forget about that article and the guy who wrote it. — rustyx, Jul 07 '18 at 21:59
In this particular case, all values of `i` are >=0, so size_t is definitely more appropriate. — jwdonahue, Jul 07 '18 at 21:59
I just read the referenced article and it's all about using size_t and pntrdiff_t for pointer arithmetic and array indexing, though it does have one unfortunate example where it suggested `for(pntrdiff_t i = 0; i < n; i++)` that lacked a type for n. After reading the whole article it is clear that size_t is perfectly safe for all unsigned pointer arithmetic, but you should always use pntrdiff_t for anything that involves signed entities in the expression. Not the strongest article on the subject, I am sure, but I can't provide a better example at the moment. — jwdonahue, Jul 07 '18 at 22:32

score 9 · Answer 1 · edited Jul 08 '18 at 11:29

9

In a blog post, I argue that you should always refrain from allocating memory blocks larger than PTRDIFF_MAX(*), because doing so will make compilers such as Clang and GCC generate nonsensical code even if you do not subtract pointers to that block in a way that causes the result to overflow.

(*) Even if malloc succeeds when you pass it a value larger than PTRDIFF_MAX. The crux of the problem is that GCC and Clang only generate code that behaves correctly when linked with such a malloc, but Glibc provides a malloc function that does not implement this limitation.

If you follow that constraint (which I encourage you to: that's the message of the blog post), then both types are equally correct.

This said, since only positive offsets need to be represented, size_t would be the natural choice in your example.

edited Jul 08 '18 at 11:29

too honest for this site

12,050
4
30
52

answered Jul 07 '18 at 22:00

Pascal Cuoq

79,187
7
161
281

This is an interesting wrinkle I have been blissfully unaware of. I have never been tasked with building an application that required a single allocation so large as half the range of a size_t. I am sure they are out there, but they I think they are rare. – jwdonahue Jul 07 '18 at 22:39
2

@jwdonahue: Maybe you'd be interested in [Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts Are Broken](https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html) — a post from Google about how arrays got big enough that index arithmetic really did overflow and break. – Jonathan Leffler Jul 07 '18 at 22:41
@JonathanLeffler, LOL, I knew somebody would dredge up a hyper-scale example ;). I will read it, but the OP's question involves an example that does not reference memory allocation or any kind of signed pointer arithmetic. – jwdonahue Jul 07 '18 at 22:44
@JonathanLeffler, I would point out, now I've read that article, they were very stupidly using signed integer math where it was not needed and should not have been used. Appropriately sized unsigned types (size_t in C/C++) would never have exhibited the referenced bug. To be fair, size_t wasn't around when those algorithms were first documented, but then neither was C. – jwdonahue Jul 07 '18 at 23:10
1

`PTRDIFF_MAX` is not the max. allowed size of any object, but the max. value of the a `ptrdiff_t`. As much as `SIZE_MAX` is just the max. value a `size_t` can represent. None is related to the max. allocation size. This would be guaranteed by the implementation for static/atuomatic objects and the stdlib for dynamically allocated objects (or whatever mechanism the environment provides, if any). Nevertheless, `size_t` is guaranteed to hold the max. allowed array index and the size of the largest possible object in bytes. `ptrdiff_t` is in fact not (also see 6.5.6p9). – too honest for this site Jul 07 '18 at 23:38
"but Glibc provides a malloc function that does not implement this limitation." - If that is true, this is clearly a bug in glibc. However, until proof, I rather think there was something else broken in the code. You might confuse this with optimistic allocation which is done by modern OS like Linux: they simply behave as if there is enough memory available up to the machine's virtually addressable memory assuming you won't use all. There is also no true memory mapped until the first write. – too honest for this site Jul 07 '18 at 23:46
I would like to point out that `malloc` is not mentioned in the OP's post. Only the appropriateness of using `pntrdiff_t` rather than `size_t` where it is obvious that all index values should be positive. `size_t` is in fact the correct type if `n` is not bounded by a smaller unsigned type. – jwdonahue Jul 08 '18 at 00:01
2

@Olaf Acknowledgement that glibc allows to allocate more than 2GiB on 32-bit platforms, that this contradicts compiler assumptions, together with decision not to change: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67999#c8 – Pascal Cuoq Jul 08 '18 at 00:35
@PascalCuoq: This could be because such a restriction does not really make sense. IIRC, Linux 32bit allowed up to 3GiB (logical) application address space. Going back to the C standard: there is no requirement a `ptrdiff_t` to be able too be able to hold all sizes of objects, not even the difference of two pointers into the same array, see 6.5.6p9 in the standard. I agree, though, it is unfortunate. Nevertheless it has been reasonable for small systems (16/24/32) in the past and it will be again once we reach the 64bit `size_t` limit. Nevertheless both sides are correct. Forget portability. – too honest for this site Jul 08 '18 at 10:16
1

@Olaf Have you read the blog post, or the bug report? They show that GCC and Clang generate **wrong code** for **defined programs**. The bug report contains a discussion, involving GCC maintainers and a libc maintainer, of what should be fixed between the compilers and the libc. You remark is not adding anything to the discussion. – Pascal Cuoq Jul 08 '18 at 11:12
@Olaf letting 32-bit programs have 3GiB of virtual address space does not mean that `malloc` needs to let them have it in one block. Programs can still be allowed to allocate three times 1GiB. – Pascal Cuoq Jul 08 '18 at 11:14
Ok, I see now. Yeah, that seems to stem from scaling down the result after the operation wrongly. I can't reproduce this currently, as I don't have an x86-32 environment (and for 64 bit there is the 48 bit virtual address-range limit protecting from these problems). I should test this for other architectures if I find the time some day. But I don't see an easy way to get this right indeed, so I understand the gcc people not fixing this. Expecially as it is of decreasing relevance. Nevertheless: thanks for your time. – too honest for this site Jul 08 '18 at 11:28

score 2 · Answer 2 · answered Jul 07 '18 at 22:26

2

Use of ptrdiff_t is ok since a[i] is translated as *(a + i).

If you take two pointers, p1 and p2, it is encouraged that you use:

ptrdiff_t d = p2 - p1; // Assuming p2 - p1 is valid.

Given that, p2 == p1 + d, i.e. <ptr type> + ptrdiff_t is a valid expression. Whether ptrdiff_t or size_t is better as the index type is a matter of opinion and/or coding style used in a team.

answered Jul 07 '18 at 22:26

R Sahu

204,454
14
159
270

I don't believe this proof is sound. And it's confusing to present types and variables together. The question is about types: of course `#=#-#` so `#+#=#`. But in our case the types are bounded. For example, in your example above you start with `ptrdiff_t = uintptr_t - uintptr_t`. That **not** guaranteed to work though, not even on the same object. `ptrdiff_t + uintptr_t = uintptr_t` is also even less likely to work. On the matter of values and not types, you're right if any construction works the others are likely to work too (afaik), but in the question we're creating the memory accessor. – Evan Carroll Jul 08 '18 at 20:46
re: `a[i] = 0;` and it's my contention that because `i` can be two types (in the question, `size_t` and `ptrdiff_t` and `ptrdiff_t` can be larger than `size_t` and negative that it's less type-safe to go outside of `size_t`. But that doesn't sound to me like coding style or opinion. – Evan Carroll Jul 08 '18 at 20:48
@EvanCarroll, as long as the array size is smaller than `PTRDIFF_MAX`. you can use either `ptrdiff_t` or `size_t` without any difference in the bahavior of the program. When your array size exceeds `PTRDIFF_MAX`, use of `size_t` is your only choice. It sounds like that's what you are trying to say. For such extreme cases, I can see that coding style or opinion does not matter. – R Sahu Jul 09 '18 at 01:26

score 0 · Answer 3 · edited Jul 08 '18 at 02:34

The answer to the OP's question is yes, size_t is most appropriate for the example code, where no pointer values are being subtracted from each other, and there are no cross-compiler/library compatibility issues around malloc behaviors. Regardless of difference in heap managers, in C, an array can be SIZE_MAX bytes in length and that requires a size_t to represent it. Nothing in the standard requires a heap manager to be able to allocate all of a process memory space in the heap, or to allocate up to SIZE_MAX bytes for that matter, but an array can be SIZE_MAX in length, hence size_t is appropriate.

Even if n is signed, using a ptrdiff_t for i isn't going to help, as the initial i < n test will fail anyway, if n is negative, because i is initialized to zero. There is no index into i that a size_t index cannot access. The only place that ptrdiff_t is needed, is where you subtract one pointer value from another, and the OP isn't asking about that.

For iterating though an array should we be using size_t or ptrdiff_t?

3 Answers3

Linked

Related