2

So, as I learned from Michael Burr's comments to this answer, the C standard doesn't support integer subtraction from pointers past the first element in an array (which I suppose includes any allocated memory).

From section 6.5.6 of the combined C99 + TC1 + TC2 (pdf):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

I love pointer arithmetic, but this has never been something I've worried about before. I've always assumed that given:

 int a[1];
 int * b = a - 3;
 int * c = b + 3;

That c == a.

So while I believe I've done that sort of thing before, and not gotten bitten, it must have been due to the kindness of the various compilers I've worked with - that they've gone above and beyond what the standards require to make pointer arithmetic work the way I thought it did.

So my question is, how common is that? Are there commonly used compilers that don't do that kindness for me? Is proper pointer arithmetic beyond the bounds of an array a defacto standard?

Community
  • 1
  • 1
rampion
  • 87,131
  • 49
  • 199
  • 315
  • 3
    it's not so much a question of compilers as of CPU architecture. There are some obscure memory models out there, and you can't in general assume plain linear memory on all systems. Just don't think of pointers as memory addresses. They're not. They have a separate set of limitations. – jalf Apr 24 '09 at 07:59

4 Answers4

7

MSDOS FAR pointers had problems like this, which were usually covered over by "clever" use of the overlap of the segment register with the offset register in real-mode. The effect there was that the 16-bit segment was a shifted left 4 bits, and added to the 16-bit offset which gave a 20-bit physical address that could address 1MB, which was plenty because everyone knew that noone would ever need as much as 640KB of RAM. ;-)

In protected mode, the segment register was actually an index into a table of memory descriptors. A typical DOS extending runtime would usually arrange things so that many segments could be treated just like they would have been in real mode, which made porting code from real mode easy. But it had some defects. Primarily, the segment before an allocation was not part of the allocation, and so its descriptor might not even be valid.

On the 80286 in protected mode, just loading a segment register with a value that would cause an invalid descriptor to load would cause an exception, whether or not the descriptor was actually used to refer to memory.

A similar issue potentially occurs at one byte past the allocation. The last ++ on the pointer might have carried over to the segment register, causing it to load a new descriptor. In this case, it is reasonable to expect that the memory allocator could arrange for one safe descriptor past the end of the allocated range, but it would be unreasonable to expect it to arrange for any more than that.

RBerteig
  • 41,948
  • 7
  • 88
  • 128
  • Thanks for that... you will notice that I didn't quote the phrase or blame Bill for it. It may not really have been said by anyone in seriousness, but the attitude was not unfamiliar to anyone who had to *pay* for a large block of memory.... – RBerteig Apr 24 '09 at 07:25
4

This is not "implementation defined" by the Standard, this is "undefined" by the Standard. Which means that you can't count on a compiler supporting it, you can't say, "well, this code is safe on compiler X". By invoking undefined behavior, your program is undefined.

The practical answer isn't "how (where, when, on what compiler) can I get away with this"; the practical answer is "don't do this".

tpdi
  • 34,554
  • 11
  • 80
  • 120
  • I think the OP is wondering about *why* this is true, as much as anything. If you've never experienced the "joy" of developing an application for Windows 3.0 it is understandable that you might not appreciate how easy we have it today ;-) – RBerteig Apr 24 '09 at 07:31
  • In fact I did program for Windows 3.0. Back then, the File Manager only allowed a file type to be associated with one program. I wrote a handler that allowed the user to add multiple programs per file type; the user then associated files with that program, which on right click allowed the user to chose from his customs list of programs for that file type. – tpdi Apr 24 '09 at 07:35
  • 1
    I just remember having to drop back to DOS to compile (and run a decent editor) because the MSC compiler couldn't be run in a DOS box reliably. Besides, after any bug at all, the chances were that exiting Windows required a three-finger salute and the DOS prompt was the first stop after that... The real joy was designing dialog box layouts on paper and in a text editor and having to sit through a compile to see what they looked like... – RBerteig Apr 24 '09 at 07:58
  • Incidentally tpdi, I didn't mean my comment to be dismissive of you in particular. Writing about the segment registers just brought up some latent tendency towards feeling like a cranky old geezer trotting out the "walked 5 miles uphill both ways in the snow" stories tonight. ;-) – RBerteig Apr 24 '09 at 08:02
  • Not taken that way at all, and sorry if my response seemed too "oh yes I did!"; in fact I'd literally forgotten I ever wrote that thing, until your comment prompted me to recall it. Even now, I can't recall /when/ I wrote it. Presumably before '95? I know I learned 8086 asm 9to write silly TSRs) long before I learned C. So my comment was more out of self-surprise than anything else. And yeah, I remember worring about segmented memory and offsets. Things /were/ more complicated then. – tpdi Apr 24 '09 at 08:29
  • IIRC, Win95 actually shipped in 1995 because they accidentally stuck to their schedule. I remember playing with the early betas of NT and running NT 3.1 on my home PC before '95 shipped. It was so liberating to get back to a flat 32-bit address space, a self-hosted compiler, and applications that couldn't crash each other or the kernel. So I naturally started writing NT drivers for the sense of danger.... – RBerteig Apr 24 '09 at 08:50
1

Another reason is that there are optional conservative garbage collectors (like the boehm-weiser GC) that assume a pointer is always inside the allocated range and if not they are allowed to free the memory at any time.

There is one popular commercial quality and used library that does break this assumption and it is the Judy Trees Library from HP which uses pointer algorithms to implement a very complex hash structure.

Lothar
  • 12,537
  • 6
  • 72
  • 121
0

ZETA-C for the TI Explorer; pointers are implemented as arrays and indexes or displaced arrays, IIRC, so your example probably wouldn't work. Start from zcprim>pointer-subtract in zcprim.lisp to figure out what the behavior would be. No idea whether this was correct per the standard, but I get the impression that it was.

Julian Squires
  • 437
  • 4
  • 8