1

Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?

e.g.,

void f(unsigned char*p)
{
// do something
}

// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);

I did get a warning though ... Compiled in Xcode

Joel
  • 4,732
  • 9
  • 39
  • 54
  • it's not permitted without an explicit cast. `int *` is not implicitly convertible to `unsigned char *`. – The Paramagnetic Croissant Jul 22 '14 at 11:53
  • @TheParamagneticCroissant: is this UB or why it is not permitted? –  Jul 22 '14 at 11:54
  • it's not permitted because `int *` is not implicitly convertible to `unsigned char *`. – The Paramagnetic Croissant Jul 22 '14 at 11:55
  • Think about what happens if you cast a negative int to a char... – EdgeCaseBerg Jul 22 '14 at 11:56
  • @EJEHardenberg: I don't know what you mean but for me it worked, was doing just memcpy inside; not permitted that's why I asked: it compiles, so is it UB? or smth else? –  Jul 22 '14 at 11:57
  • 2
    @EJEHardenberg how is that relevant here? – The Paramagnetic Croissant Jul 22 '14 at 11:57
  • typecast it explicitly `f((unsigned char)&x);` – Sathish Jul 22 '14 at 11:58
  • 1
    @Sathish: you meant `f((unsigned char*)&x);` but the question is what happens if used without cast? –  Jul 22 '14 at 11:59
  • 4
    @dmcr_code if you omit the cast, then the code is ill-formed. – The Paramagnetic Croissant Jul 22 '14 at 12:02
  • @dmcr_code If you use with out cast means, compiler generates warning. but it will accept the value(lower 8 bits) what you are passing! – Sathish Jul 22 '14 at 12:03
  • @TheParamagneticCroissant: I know these terms: http://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior/. But "ill formed" is something I don't understand ... –  Jul 22 '14 at 12:04
  • @dmcr_code it's the official term for "semantically invalid". – The Paramagnetic Croissant Jul 22 '14 at 12:05
  • @TheParamagneticCroissant: but it compiles; so I am confused now –  Jul 22 '14 at 12:06
  • 1
    @dmcr_code Don't confuse "semantically" with "syntactically". Perhaps your compiler can make sense of the code (despite the C standard not requiring it to do so), and most probably it treats the code as if there was a cast. However, if an ill-formed program compiles, it automatically has undefined behavior. Also, as soon as you enable a compiler flag which treats warnings as errors, for instance `-Werror` for GCC, the code won't compile. Whether some particular piece of code compiles with some particular compiler and flags is irrelevant and is not a measure of code correctness. – The Paramagnetic Croissant Jul 22 '14 at 12:10
  • 1
    @dmcr_code: A warning from gcc can mean, that the code is not strictly conforming. The standard allows an implementation to compile as many invalid programs as it wants to, as long as it compiles all valid programs (and gives “diagnostic messages” (for gcc e.g. a warning _is_ a diagnostic) in certain cases, e.g. for your code). So a compiler is _allowed_ to refuse to compile your code and is _required_ to give a diagnostic. – mafso Jul 22 '14 at 12:16
  • @TheParamagneticCroissant I think it's pretty relevant, it's already ill defined behavior because of the reasons you've listed, but I'd be more concerned about _why_ the pass of an int to something that expects an unsigned char? My first thought is something like EOF, where you hold it in an integer. But since the function expects an unsigned value that doesn't make sense. So not only does the code not compile (if you're using the right flags), but it might be semantically wrong. which is far worse than a warning. – EdgeCaseBerg Jul 22 '14 at 12:22
  • @EJEHardenberg: You'd want it because for instance you want to do memcpy inside the function or do something else. Anyway, i am a bit confused I can't get authoritative answer on such question (e.g., whether it is UB or smth else) –  Jul 22 '14 at 12:24
  • 1
    @dmcr_code `memcpy` takes a `void*`, not an `unsigned char*`. – Fred Foo Jul 22 '14 at 12:25
  • @larsmans: I know I may ask that in other question (was curious if you need casts inside memcpy - because in printf I think you need a cast to void* if you use %p specifier), but let's currently focus on the question at hand –  Jul 22 '14 at 12:28
  • @EJEHardenberg No, it's not relevant. We're not talking about casting the objects pointed to by the pointers. We're talking about casting the pointers themselves, i. e., reinterpreting a certain address. That does not (necessarily and generally) result in the same values (after dereferencing the said pointers) as a conversion between the pointed objects themselves. The reason behind this particular type of pointer aliasing may be that OP wants to inspect the byte-wise representation of the `int` object. – The Paramagnetic Croissant Jul 22 '14 at 12:28
  • @TheParamagneticCroissant: what you said about UB seems not to be correct as now there was answer but there was also two conflicting information from two users ... –  Jul 22 '14 at 13:25
  • @dmcr_code "what you said about UB seems not to be correct" - why not? the two answers below do not conflict, it's just that they talk about different things. – The Paramagnetic Croissant Jul 22 '14 at 14:05
  • @TheParamagneticCroissant: ok it's grey area for me yet, I'll just use with cast –  Jul 22 '14 at 14:11
  • Please don't abuse trivial edits to "bump" your question. If this continues, we'll be forced to lock it from edits. – Brad Larson Jul 22 '14 at 14:22

4 Answers4

2

int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):

  1. Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
  2. Converting any pointer to a char * will point to the lowest-addressable byte of the object.

So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.

Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
0

C11, §6.5.2.2:

2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.

§6.5.16.1 describes assignment in terms of a list of constraints, including

the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right

int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • “shall” in a constraints section means: Give a diagnostic (and the implementation is allowed to stop compiling), not UB. – mafso Jul 22 '14 at 12:51
  • @mafso: so is it UB or not UB? (it's confusing when all answers differ on a page! :)) –  Jul 22 '14 at 13:01
0

Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):


Safe:

void f(char* p)
{
    char r, w = 0;
    r = p[0]; // read access
    p[0] = w; // write access
}

...

int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning

This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.


Unsafe:

void f(int* p)
{
    int r, w = 0;
    r = p[0]; // read access
    p[0] = w; // write access
}

...

char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning

This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.

barak manos
  • 29,648
  • 10
  • 62
  • 114
  • Technically, the second example is undefined behavior ("If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined." - C11 6.3.2.3 clause 7) – Drew McGowen Jul 22 '14 at 13:31
  • @DrewMcGowen: OK, that's why I started off with "some would say...". – barak manos Jul 22 '14 at 13:32
  • @barakmanos: What about the first example is it UB? That is what I am interested in; it is weird to accept an answer where some would say one thing and others, other –  Jul 22 '14 at 13:33
  • @dmcr_code: should be fine as long as you do not attempt to access `p[i]` with `i >= sizeof(int)`. – barak manos Jul 22 '14 at 13:34
  • @DrewMcGowen: But in your answer you said it is UB?? –  Jul 22 '14 at 13:37
  • @DrewMcGowen: last line of your answer. So even without explicit cast it is *NOT* UB? –  Jul 22 '14 at 13:38
  • I'm not 100% sure about omitting the cast, but I imagine it's technically UB (since GCC issues a warning). However, the explicit cast avoids UB. – Drew McGowen Jul 22 '14 at 13:41
  • @barakmanos: my question - *without* explicit cast, is it UB? –  Jul 22 '14 at 13:42
  • @DrewMcGowen: ok if I don't get further info on it, I will at least use a cast :| –  Jul 22 '14 at 13:43
  • @dmcr_code: I answered what would happen de-facto. If you want to know the definition then you should probably read the ANSI-C standard. To the best of my knowledge, pointer casting does not yield any additional assembly code, so it should have no effect on runtime behavior whatsoever... to the best of my knowledge... – barak manos Jul 22 '14 at 13:44
  • @barakmanos: so to be on the safe side I use a cast? (as in your first example)? –  Jul 22 '14 at 13:47
  • @barakmanos: ok so you say in order to understand it I'd have to read the standard; I don't get why 2nd case is UB, but thanks for the answer anyway –  Jul 22 '14 at 13:51
  • @dmcr_code: Please read carefully. The second case is not only UB, it is explicitely **unsafe**!!! In simple words, when you read or write an `int`, its address in memory must be properly aligned. When you allocate a `char` array, its address in memory will not necessarily be aligned to `sizeof(int)` bytes. When you simply declare `int x` for example, it is not a problem, the compiler will allocate it properly. But when you declare `char x` or `char x[...]`, then you cannot safely use it with `int` read/write operations. – barak manos Jul 22 '14 at 13:53
  • @barakmanos: are you saying something like this is not allowed?? `char x[4]={0}; int y=9; memcpy(x,&y,4);`? –  Jul 22 '14 at 13:56
  • 1
    @dmcr_code: No. `memcpy` copies one byte at a time, which is perfectly safe. But `*(int*)x = y`, on the other hand, is unsafe because the compiler compiles the assignment into an int-store operation and not a byte-store operation. The second piece of code in my answer also demonstrates the same kind of danger. – barak manos Jul 22 '14 at 14:00
  • @barakmanos: ok fine if memcpy is ok to use this way at least. But I've seen people use stuff like this: `*(int*)x=y` too, saying to watch out *only* for the endianness, etc.. –  Jul 22 '14 at 14:02
  • @dmcr_code: If you don't ensure that `&x` is divisible by `sizeof(y)`, then you might get a runtime exception (or worse, a result in `x` which you have not anticipated). Endianness is indeed yet another problem in this case. – barak manos Jul 22 '14 at 14:07
  • @dmcr_code whomever said that endianness is the only issue with `*(int *)x = y;` was **wrong.** See [this question of mine](http://stackoverflow.com/questions/24598335/is-the-strict-aliasing-rule-really-a-two-way-street). – The Paramagnetic Croissant Jul 22 '14 at 14:07
  • @barakmanos: but in this case sizeof(x)=4 and is divisible by sizeof(int)?? –  Jul 22 '14 at 14:12
  • @TheParamagneticCroissant: nice you are giving some links otherwise it is getting confusing, with so many rules –  Jul 22 '14 at 14:12
  • @dmcr_code: Please read the comment carefully. It says "**`&x` divisible by `sizeof(y)`**". – barak manos Jul 22 '14 at 14:14
  • @dmcr_code You're welcome. Actually, there's only **one** rule and one exception. In the general case, you can't do `*(T *)p = foo;` if `T` and the type of `*p` are different. The only exception is when `T` is a signed or unsigned `char`. – The Paramagnetic Croissant Jul 22 '14 at 14:14
  • @barakmanos: see the answer here (about assigning pointers etc.): http://stackoverflow.com/questions/18910947/embed-int-string-in-byte-array-storing and here: http://stackoverflow.com/questions/18977199/embed-multiple-ints-in-byte-array –  Jul 22 '14 at 14:23
  • @TheParamagneticCroissant: see my above comment please –  Jul 22 '14 at 14:24
  • @dmcr_code the first part of the answer to your first question is incorrect. the second question and answer are not related to the issue being discussed here. – The Paramagnetic Croissant Jul 22 '14 at 14:27
  • @TheParamagneticCroissant: What's incorrect about it? (before you say "UB", please read the opening statement right above it). – barak manos Jul 22 '14 at 14:29
  • @dmcr_code It's incorrect for the same reason I've mentioned in my comment there: it violates the strict aliasing rule by casting a pointer to `char` to a pointer to `int` then dereferencing it. More on this topic [here](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule). – The Paramagnetic Croissant Jul 22 '14 at 14:31
  • @TheParamagneticCroissant: In the first part of the answer, the casting is from an `int` pointer to a `char` pointer, and **not** the other way round (as you state in your comment)!!! – barak manos Jul 22 '14 at 14:33
  • @barakmanos I don't know what you are referring to (we are **not** discussing your answer but the one linked by dmcr_code!), but I was talking about `*(int*)(sendBuffer + pos) = some_int;`. Since `sendBuffer` is an array of `unsigned char`, so `sendBuffer + pos` is a pointer to `unsigned char`, which is being cast to `int *` and then dereferenced. That is not allowed by any means. – The Paramagnetic Croissant Jul 22 '14 at 14:35
  • @TheParamagneticCroissant: ok I'll use memcpy then to serialize int to chars if I need to instead that assignment (provided there are no endianness issues) –  Jul 22 '14 at 14:49
0

The cast is required, see C11 (n1570) 6.5.2.2 p.2:

[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.

This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)

One of the following shall hold:

[…]

  • the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.

[…]

And unsigned char isn’t compatible to int.

These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.

Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:

Example

An implementation shall issue a diagnostic for the translation unit:

    char i;
    int i;

because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.

Syntax violations are treated equally.

So, strictly speaking, your program is as invalid as

int foo(int);
int main() {
    It's my birthday!
    foo(0.5 ^ 42, 12);
}

which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).

For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).

The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):

1.3.9

ill-formed program

program that is not well formed

1.3.26

well-formed program

C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.

The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.

Community
  • 1
  • 1
mafso
  • 5,433
  • 2
  • 19
  • 40