0

I hope this is not a duplicate, if it is I apologize.

Edit: this is a duplicate, in the sense that a similar question has been asked before. The answers to that question don't quite fit. To get a better sense of the subtlety I am trying to get at, read my comment about uninitialized memory. End of edit.

My question is about whether the C code below is in some sense legal. Even if it were, I wouldn't use it, since it generates a warning and warnings annoy me. Still, I'm curious. But enough of the rambling introduction, the question itself is rambling enough as it is. The setup is as follows.

Suppose you want to sort an array of integers. Instead of implementing your own sort, you decide to use qsort, which wants a comparison function which takes two void*. Inside the function, you cast the void* into int*, and then compare them.

In an ideal world, I'd like to pass a function that takes two int* instead. The compiler warns if you try this, for obvious reasons: for all it knows, qsort will try to call your function with a short* and a double*, and havoc will ensue. If the C type system were more expressive, then the declaration of qsort could give the compiler enough information to deduce that this will never happen.

But even with the type system we have, a Sufficiently Smart Compiler (tm) knows exactly how qsort works, and thus it knows that your comparison function will always get pointers to integers. Unfortunately, even then, the compiler must warn, because if the compiler is truly portable, it may have to support platforms where the size of a void* is not the same as the size of an int*.

But such platforms are very rare, bordering on non-existent. Far more common are platforms where an int is only 2 bytes, and on those platforms the expression 123*456 is undefined behavior. But most platforms today have a larger int, and on those the expression is well defined. If a compiler with 4 byte integers were to assume that 123*456 is unreachable, then that compiler is not standard compliant. In other words, once the implementation makes some promise (large int), the standard forbids it from doing things that are in general allowed (treating some expressions as if they overflowed).

Let's take another example. In most operating systems, dereferencing a null pointer will crash your program. But even if you don't mind crashing, you must still check the return value of malloc, because dereferencing null is undefined behavior according to the standard, and very few compilers make stronger guarantees in this case.

My question is thus: where does the function pointer cast fall on this continuum? I'm pretty sure that most platforms in common use today do in fact guarantee that all pointers have the same representation, are passed using the same calling convention, etc. If I only care about those platforms, can I then pass a comparison function that takes int* instead of void*, just like I can fearlessly multiply 3-digit numbers, or is it more like the situation with dereferencing null, where this particular behavior is undefined unless the documentation explicitly says otherwise?

Mark VY
  • 1,489
  • 16
  • 31
  • I admit this may be a duplicate of some question, but unless I'm missing something, it's definitely not a duplicate of that question. – Mark VY Sep 23 '15 at 06:02
  • Alex Lop: no. RuffleWind: yes, thank you, that's exactly it. Unfortunately, I already saw that question before posting here. The reason is that the answers don't say much I don't already know: the behavior is undefined, but will work in practice on most platforms. What I want to know is: are there platforms where the C standard guarantees that this will work, after taking into account all the other guarantees the compiler already makes. But again: thanks!! – Mark VY Sep 23 '15 at 06:08
  • Undefined behavior is undefined behavior. It is not implementation-defined behavior; there are no guarantees whatsoever about what will happen, and it will may not even be predictable. If this is unclear, you may want to refer to http://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior –  Sep 23 '15 at 06:11
  • Put in another way, undefined behavior means the compiler may compile your code into complete nonsense. – Rufflewind Sep 23 '15 at 06:13
  • The C standard does not make any guarantees about things it does not define ... – M.M Sep 23 '15 at 06:16
  • Just bite the bullet and write correct code – M.M Sep 23 '15 at 06:17
  • Yes, undefined is not defined, but the rules are subtle. For instance, accessing uninitialized memory is classic undefined behavior. Normally, this is fine, since you have no idea what's in that memory anyway. But there are cool tricks that involve using uninitialized memory, like this one: http://research.swtch.com/sparse. By diligent study of the answers and comments at http://stackoverflow.com/questions/11962457, it becomes clear that, surprisingly, this particular use of uninitialized memory is guaranteed by the standard to be well defined on any platform you're likely to meet. – Mark VY Sep 23 '15 at 06:36
  • @M.M: Historically, a major purpose of making certain actions invoke Undefined Behavior was to allow platforms freedom to supply *useful* behaviors. For some applications, having a platform allow `int1+=uchar1; if (int1 < 0) int1 = INT_MAX;` was helpful. For others, being able to say `int1+=uchar1;` and know that the program wouldn't continue if an overflow occurred was helpful. Leaving overflow as Undefined Behavior meant that code targeting platforms where overflow worked one way or the other could make use of that behavior. – supercat Sep 23 '15 at 17:00
  • @M.M: The C Family of Dialects has thrived through the 1990s because most dialects supported certain consistent or semi-consistent behaviors beyond what the Standard required [e.g. on a 32-bit machine, `1<<225` might arbitrarily yield 2 or zero, and it might take nine times as long as `1<<25`, but it wouldn't do anything "wacky"]. I don't think C would ever have been popular if the kinds of hacks which are broken by hyper-modern compilers had never worked. – supercat Sep 23 '15 at 17:05
  • 1
    This question would be much less "rambling" if you'd just replaced a bunch of prose with illustrative code snippets... – Oliver Charlesworth Sep 23 '15 at 21:52
  • I suppose I should, though it's not clear (to me) which pieces of prose are translatable to code without losing something important. – Mark VY Sep 23 '15 at 21:55
  • In practice, using `qsort(arr_strings, arr_count, sizeof(*arr_strings), (int(*)(const void *, const void *))strcmp);` works in most cases, but that doesn't mean you're not still in the realm of UB. To be as well-defined as possible, you'd wrap `strcmp` with a function matching the signature required by `qsort`. –  Sep 23 '15 at 22:33
  • @supercat This is all just unverifiable speculation, but I think the "semi-consistent behaviours" actually held back development precisely because they broke in some cases. The Linux Kernel comes to mind. For a long time you had to compile the kernel with exactly gcc 2.7.2.1 or something, because it depended on a lot of undocumented, coincidental behaviour of that compiler. It was a major undertaking to rewrite the kernel in more portable code and thereby get it out to work on more devices. – M.M Sep 23 '15 at 23:05
  • Chrono: I once had a nasty bug that resulted from doing essentially what you said. This is horribly wrong because you get the level of indirection mixed up. Remember: strcmp expects 2 C-strings (char*). On the other hand, qsort will hand you 2 pointers to strings (char**). So, no, this will not work in most cases, or even in some cases. It will result in strcmp trying to interpret your array of char* as if it contained valid string data :) – Mark VY Sep 23 '15 at 23:54
  • @M.M: A major problem with C over the last few years has been a lack of coordination between programmers and certain compiler writers with regard to what "coincidental" behaviors have sufficient advantage over any other standard behaviors that they should be considered normative. For example, on many compilers, if `y` is 32, `x< – supercat Sep 24 '15 at 15:29
  • ...either behavior will be equally good. Since some processors would naturally regard the result as being `x` and would require extra code to make the expression yield 0, and others would naturally regard the result as being 0 and would require extra code to make the expression yield `x`, being able to write the code as simply `x< – supercat Sep 24 '15 at 15:32
  • @M.M: The solution is not simply to say that programmers should never have used platform-specific behaviors, but rather to recognize the need for a means by which programs could indicate their requirements, platforms could be required to reject programs whose requirements they cannot meet, and problems which will work with all compilers that are allowed to compile them could be defined as being standards-conformant subject to requirements which are expressed in standard fashion. Such compliant programs would then be free to use useful platform-specific behaviors *if they declare that... – supercat Sep 24 '15 at 15:38
  • ...they do so*. Declaring things in standard format would then make it possible for compiler writers to know what combinations of behaviors they should try to support. If the only way to make a lot of programs work on a particular compiler is to disable a large family of optimizations, even though only one of those optimization would be problematic, that would be a sign that the compiler could benefit from the ability to disable the problematic optimization separately. – supercat Sep 24 '15 at 15:42
  • @supercat I agree that it would be nice if shifting by greater than the width were *unspecified* or even "implementation-defined with i-d signal". On the other hand it would be bad if this meant that inefficient code for every single shift (where the shift size was not known at compile-time) on some platform. – M.M Sep 24 '15 at 21:14
  • @M.M: A big problem with C is that it's not so much a language as a family of dialects. IMHO the usefulness of the family would be enormously enhanced if there were a standardized "meta-language" that would allow a category of programs that compilers are allowed to reject, but which if accepted must encapsulate their requirements sufficiently that any compiler that legitimately accepts one of the programs must produce an executable meeting its requirements]. That would allow compilers that would be unable to offer common behaviors to be used to run programs that don't need them, while... – supercat Sep 24 '15 at 21:39
  • @supercat it seems to me that that would just fracture C into "C-x86" and "weirdo C" , and 99% of people would code for "C-x86" . Life would become very difficult for anyone on the platforms currently supported by C but not by the new dialect. – M.M Sep 24 '15 at 21:43
  • ...allowing programs the ability to *safely* make use of behaviors which can be guaranteed by the platform upon which they're running. Existing compilers could add support for a workable implementation of the meta-language merely by bundling an additional header file, such that programs which use the meta-language to demand guarantees the compiler naturally makes would compile, and those which demand guarantees it cannot make would be rejected. Adding support to the core logic of the compiler itself would make it possible for a compiler to adjust code generation according to... – supercat Sep 24 '15 at 21:44
  • ...programs' requirements, but compilers wouldn't have to do that unless or until it was clear that such logic would add value. I suspect people who use some obscure platforms might feel abandoned if programs started explicitly refusing to compile on those platforms, but would it be better to have 75% of code refuse to compile on a given platform and have the remaining 25% compile and work, or to have 33% compile and fail outright, 33% compile and mostly work, and 34% compile and work perfectly? I would think the former would be better, especially if compiler diagnostics could... – supercat Sep 24 '15 at 21:49
  • ...identify the parts of the program that needed to be changed. IMHO, if something is a proper feature of a language, it should have essentially the same meaning on all implementations that accept it. To my mind, the fact that some platforms are required to evaluate `(uint32_t)1-(uint32_t)2` as 4294967295u and others are required to evaluate it as -1 means that `uint32_t` is not really a language feature, but rather a feature of some dialects that interpret as a type that supports direct arithmetic, and also a feature of some other dialects that regard it as a type that needs promotion. – supercat Sep 24 '15 at 21:53
  • BTW, I've coded for platforms where `char` was 16 bits, and for platforms where unrelated pointer comparisons were problematic, and various other weird things, and would suggest that the status quo is needlessly rough for users of such platforms. In many cases, compilers for such platforms could without too much difficulty generate code which would emulate more "mainstream" behaviors at a cost which would often be greater than the cost of algorithms that were designed not to require them, but less than the cost of straightforwardly-adapted code that relies upon their existence. – supercat Sep 24 '15 at 22:02
  • If a program e.g. requires a type that behaves in the way that `union {uint8_t bb[4]; uint32_t w;}` behaves on a x86 or ARM, it should be possible for a compiler for almost any to platform implement a type which behaves the same way except that the addresses of `w` couldn't be taken. Had such a feature existed, it would have made it easier to port a TCP stack to the machine with 16-bit `char` rather than having to implement the whole thing from scratch. – supercat Sep 24 '15 at 22:11
  • Sorry I'm late to the tangential discussion that popped up around my question, but I don't see how the compiler could implement uint8_t if char is 16 bits. Isn't char supposed to be the smallest type? Even bool can't be smaller than char. – Mark VY Sep 25 '15 at 11:52
  • @MarkVY: I said "a type that behaves in the way that [the indicated type] behaves". C doesn't have a way of declaring such a type on a processor where `char` is 16 bits, but such means were added, then `theUnion.w = 0x12345678` would store the value using the lowest eight bits out of each of four `char` values, regardless of whether `char` was 8 bits, or 16, or any other value. – supercat May 05 '16 at 19:00
  • I'm having trouble parsing what you wrote. I think I agree, but I'm not sure. – Mark VY May 06 '16 at 03:51

0 Answers0