1

I have this character array

char array[] = {1,2,3};     //FL, FH, Size

and I am trying to access it using pointers in such a way that I get FLFH value together, stored in an integer variable.

I did this

int val =0;
val = *(int*)array;
printf("value of p is %d\n",val);

I was expecting the result to be 12, but it was some 8-digit number which I think maybe the address of the value or something. Could anyone tell me what am I doing wrong here?

jscs
  • 63,694
  • 13
  • 151
  • 195
SandBag_1996
  • 1,570
  • 3
  • 20
  • 50
  • 2
    For starters, if `sizeof(int) >= 4`, you're running off the end. – chris Jan 07 '13 at 02:54
  • okay, suppose array[]={1,2,3,4,5,6,7,8} – SandBag_1996 Jan 07 '13 at 02:56
  • Well, if it's just the two, `10 * array[0] + array[1]` would give you 12. I don't know how flexible you want it to be. – chris Jan 07 '13 at 02:58
  • The end result is not a concern here.. How am i getting to it is the point. I want to do it using pointers, and exploiting the fact that integer takes 2/4 bytes depending on the machine. – SandBag_1996 Jan 07 '13 at 02:59
  • 1
    'I was expecting the result to be 12' -- why in the heck would you expect that? – Jim Balter Jan 07 '13 at 03:09
  • @Jim dude chill.. its always the first time for everything. Trying out new things doesnt hurt. The answer would only lead me to the better understanding on pointers, right? – SandBag_1996 Jan 07 '13 at 03:17
  • The answer to this will ultimately lead to near-*zero* knowledge dump about pointers, though it may be educational on the topics of endian format, data alignment, and possibly BCD (ok that's a stretch). – WhozCraig Jan 07 '13 at 03:26

4 Answers4

4

It's never going to give you twelve.

You're taking two one-byte values and looking at them as if they comprise a single two-byte entity. When you look at them together, you're re-interpreting their value. The integer 1 looks like this, as bits: 00000001, and 2 like this: 00000010. The compiler knows how big they are and thus will allow you to access them as individuals, but they're laid out in order in your array, right next to each other in memory.

Inspect the two bytes together as if they were an int, and you have: 0000000100000010, whose value is not 12; it's 513.

For your further reading, what you're doing is a kind of "type punning".

jscs
  • 63,694
  • 13
  • 151
  • 195
  • I don't understand why you think they are in a "two-byte container". What container would that be? What possible system could give you the result that you claim you'll get, and which is different from what OP reports? – Mark Dominus Jan 07 '13 at 03:08
  • If you look at the two consecutive bytes of the `char` array as if they comprise a single two-byte entity, you end up with those bits (big-endian). – jscs Jan 07 '13 at 03:10
  • The actual result (in decimal) is 513, which means the value in the container goes like '2' first, and then '1'.. Shouldnt it be the other way around Josh. – SandBag_1996 Jan 07 '13 at 03:10
  • No, what I meant is, shouldnt '1' go in first, and then '2' should go in the container. Instead what is happening is that '2' is going in first, followed by '1' – SandBag_1996 Jan 07 '13 at 03:13
  • @UnderDog: No, that's the order they're in in your array, so that's the order they're in in memory, so that's the order they are when you read them at a different "bite size". (And I'm ignoring endianness, but MJD's repeated admonitions about it are important.) – jscs Jan 07 '13 at 03:15
  • @MJD: Thanks for helping me clarify! Sometimes one doesn't know how others will interpret words that make perfect sense in one's own head. – jscs Jan 07 '13 at 03:15
  • @UnderDog Your machine uses [little-endian byte order](https://en.wikipedia.org/wiki/Little-endian), which means that the bytes that make up an `int` value are stored backwards from the way you expect. This is yet another reason why this is never going to work. – Mark Dominus Jan 07 '13 at 03:16
  • Great! Thanks. This answer really explained in much better way how things are working. Thanks a lot Josh – SandBag_1996 Jan 07 '13 at 03:18
  • @UnderDog: Sure thing! Also, I just linked a Wikipedia article that you may be interested to read. – jscs Jan 07 '13 at 03:26
3

This is a bad idea, for readability and for portability. Use array[0] * 10 + array[1] if that's what you mean, and trust the the peephole optimizer to speed it up. If you must access the value via a pointer, use a char pointer and write p[0] * 10 + p[1], which is easy to understand, perfectly legal, and portable.

Reasons why your code might not work are many, and this strongly suggests that what you're trying to do is dumb, or at least that you are in over your head.

The first one is that bytes range from 0–255, not from 0–9, and so if you do use this technique, and it works, you are going to get 1*256+2 =258, not 1*10+2 = 12. You are never, ever going to get 12. The computer does not work that way. This is why you have to call a function like atoi() to convert a string like "12" to the number 12 before you can do arithmetic on it.

If you have four byte ints, that would also be why you were getting some big number out: You think you're getting 1*256+2, but you're actually getting ((1*256+2)*256+3)*256+???. Also, as chris says in the comments, in such case your array is too small anyway, whence the ??? in the previous formula.

You could try using a short instead of an int and see if that works better; a short is likely to be a 2-byte integer. But this isn't guaranteed, so it still won't be portable to systems with shorts longer than two bytes. Better practice is to find out the actual type on your system that corresponds to a two-byte integer (perhaps short, perhaps something else), and then typedef something like INT2 to be that type, and use INT2 in place of int.

Another potential problem is that your system might use a little-endian byte order, in which case the bytes in your array are in the wrong order to represent the two-byte machine integer you want, and your trick is never going to work. Even if can be made to work on your machine, it will break if you ever have to run the code on a little-endian machine. (Your comments elsewhere in this thread suggest that this is exactly what is going on.)

So just do it the easy way.

Mark Dominus
  • 1,726
  • 12
  • 38
  • Because of the OP's misconceptions and because there is no explanation of *why* this is being done, we don't know whether 1*10+2 is desired or 1*256+2, or whether the OP just wants to put those two chars into an int regardless of how they are transformed in the process (I suspect the latter). – Jim Balter Jan 07 '13 at 03:24
  • OP says "I was expecting the result [of `printf("value of p is %d\n",val)`] to be 12". That seems quite clear. – Mark Dominus Jan 07 '13 at 03:25
  • But that expectation was the result of a misunderstanding. From the OP's other comments here, it seems unlikely to be a *requirement*. – Jim Balter Jan 07 '13 at 03:29
3

You didn't get the result you expected because bytes are laid out on 8 bit boundaries, so adjacent binary byte values are scaled by 25610, not 1010.

Also, if you cast the type of something and dereference it, the compiler will compile it but technically your program is nonconforming.*

One problem is alignment. If you change the type with open code it may not begin at the right boundary, and it may not be big enough. You can fix all that with a union. It's still nonconforming and the result is not specified, but in practice it is reliable and even somewhat portable, if you don't mind different results depending on byte order and int size.

union a {
  char  c[3];
  int   i;
  short s;
} a;

This might also be a good application for <stdint.h>, but that's a topic for another question.


*You might wonder, then, why casts exist ... it's because (A) despite being banned by the standard, type punning is widely used in existing C programs and particularly in operating software, and (B) there are mostly-conforming uses that define generic interfaces but always (or, ahem usually) cast the thing back to the original type before dereferencing it.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • Your second and last paragraphs are not correct. It is permitted to cast a `char *` pointer to any other type, as was done here, and to dereference it if the object to which it points has the correct alignment for the dereferenced type. This is why `malloc` works. As a practical matter, it will work if the array is aligned properly (which `malloc` is careful to do), and it will abort the program with a bus error if not. Since that's not what's happening here, we can suppose that the alignment is correct. – Mark Dominus Jan 07 '13 at 03:19
  • 1
    Actually, it's the other way around, you can convert to `char *` from any other type and the result is specified. All the standard says about alignment is that you can temporarily store the pointer value and then get it back. Anyway, I agree that it's more complex than I went into. (This, btw, is the reason that gcc's alias analysis only works when casting to `char *` without `-fno-strict-aliasing`.) – DigitalRoss Jan 07 '13 at 03:26
  • You can convert both ways. – Mark Dominus Jan 07 '13 at 03:28
  • 1
    I'd be interested in a citation of the C standard that says that you can "dereference it if the object to which it points has the correct alignment for the dereferenced type". – Jim Balter Jan 07 '13 at 03:34
  • Sure. 6.3.2.3#7 ("Pointers") says: "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer." Both `char *` and `int *` are object types. The pointer conversion is legal, as long as the resulting pointers are correctly aligned, and if so the conversion is required to be reversible. – Mark Dominus Jan 07 '13 at 03:36
  • I would too, Jim. ISO "c99" **6.3.2.3 (7)** says, specifically: *A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive ... yield pointers to the remaining bytes of the object.* – DigitalRoss Jan 07 '13 at 03:36
  • So, it doesn't say anything about dereferencing it, except for chars. It only notes that the pointer can be converted back. (It is funny that we are both citing the same language to make opposite points.) – DigitalRoss Jan 07 '13 at 03:38
  • If you have a pointer to an object of the correct type, you can dereference it. If it is equal to another pointer of the correct type then it points to the same object. The whole point of the "shall compare equal" language of that section is that the round-trip is safe. – Mark Dominus Jan 07 '13 at 03:39
  • @MJD: That's certainly what I was trying to say. It doesn't seem to be what you were saying, but perhaps I just misunderstood your point. – DigitalRoss Jan 07 '13 at 03:40
  • Ok, the language does imply that you can dereference a `char*` cast to another type if the pointer is properly aligned. But it does *not* say "It is permitted to cast a char * pointer to any other type" -- that part is false; the pointer must be properly aligned even if you don't dereference it, else the behavior is undefined. (This makes it possible to have an implementation in which only aligned pointers have representations as other pointer types.) – Jim Balter Jan 07 '13 at 03:47
  • Well, I'm no longer sure. I think you're probably right that a conforming program may not take a `char *`, cast it to a new object type, and dereference it, *unless* the `char *` was itself originally cast from a valid pointer of the new type, alignment notwithstanding. (I think this is what you were getting at in your first comment.) But on the other hand, 6.5.3.2#4 says "If the operand [of unary `*`]… points to an object, the result is an lvalue designating the object." This leaves open whether an `int` pointer converted from a `char[]` can be said to "point to an object". I don't know. – Mark Dominus Jan 07 '13 at 03:52
  • 1
    Ah, ok, I was wrong that "the language does imply " ... DigitalRoss was right about the round trip only being guaranteed to work in one direction (by 6.3.2.3; 6.5.3.2#4 seems to imply that it works the other way, but it's questionable whether that was intended). – Jim Balter Jan 07 '13 at 04:01
  • 1
    Relevant: [1](http://stackoverflow.com/a/9060885/1277934) [2](http://stackoverflow.com/a/4318446/1277934). Both support DigitalRoss's interpretation, and refute mine. – Mark Dominus Jan 07 '13 at 04:13
-1

maybe try :-

char array[] = {1,2,3};     //FL, FH, Size
int val =0;
val = *(short*)array;
printf("value of p is %X\n",val);

this is assuming you are trying to reinterpret your char array as an int. Though different platforms you may get different results than expected.

Keith Nicholas
  • 43,549
  • 15
  • 93
  • 156
  • Didnt get the expected value. Can you tell me the significance of %x here? – SandBag_1996 Jan 07 '13 at 02:59
  • prints as hex instead of decimal – Keith Nicholas Jan 07 '13 at 03:01
  • I changed it to "short" which is 16bits – Keith Nicholas Jan 07 '13 at 03:01
  • which seems to work for me.... except, hex, I get 201, which is what I'd expect – Keith Nicholas Jan 07 '13 at 03:03
  • but he is getting his FLFH stored together in a int – Keith Nicholas Jan 07 '13 at 03:06
  • though, not in order due to Endianness – Keith Nicholas Jan 07 '13 at 03:07
  • Worst case misunderstanding of the question (and the asker's misconceptions). – Jim Balter Jan 07 '13 at 03:11
  • Um, he said he expected the result to be 12. And your code, like his, produces undefined behavior, possibly destroying the universe in the process. – Jim Balter Jan 07 '13 at 03:15
  • It doesnt produce undefined behavior actually. It produces the result it should produce (like what Josh mentioned in his answer). +1 for that – SandBag_1996 Jan 07 '13 at 03:23
  • @UnderDog You don't understand what undefined behavior is -- it's a technical term specified in the C standard. – Jim Balter Jan 07 '13 at 03:26
  • @UnderDog undefined behavior basically just means its not specified...things often work as expected even if it is undefined, and in many cases you'd be hard pressed to find compilers that break things. – Keith Nicholas Jan 07 '13 at 03:32
  • @KeithNicholas You would not be hard pressed to find implementations where casting a `char*` to a `short*` and then dereferencing it sometimes results in a fault. Also, Underdog doesn't seem to understand that there are two different results that are common among implementations. – Jim Balter Jan 07 '13 at 03:36
  • Actually its not that "undefined" as you say it is Jim. check out this page which I found on google (it is by prof who has done phd from stanford), and check out slide with the title "Pointers to the rescue". It says the same thing. I guess your definition of undefined is little too away from C standard :) http://medesign.seas.upenn.edu/uploads/Courses/510-11C-L11.1.pdf – SandBag_1996 Jan 07 '13 at 03:49
  • Keith, I've got the scars to prove it. @Underdog I served on X3J11, the C standards committee ... it is indeed undefined, and the language that says so is quoted (twice) in the comments to DigitalRoss's answer. But since you already know it all, I won't respond further. – Jim Balter Jan 07 '13 at 03:53