Confusion about static and dynamic arrays in C

Question

typedef struct mystruct{
  int a;
  char arr[10];
  char *str;
}mystruct;


void f(void *data, int offset){
  char *s = (char *)(data + offset);
  printf("%s", s);
}

void g(void *data, int offset){
  char *s = *(char **)(data+offset);
  printf("%s", s);
}

int main(){
  mystruct test;
  test.a = 2;
  test.str = malloc(100);
  strcpy(test.arr, "Hello ");
  strcpy(test.str, "World!");
  f(&test, offsetof(mystruct,arr));
  g(&test, offsetof(mystruct,str));
  return 0;
}

I am wondering why do I need two different ways to print strings. In function f, what is (data + offset) actually pointing at? Is it not pointing to arr which is a char pointer to the first element of the string? But in function g, (data + offset) is pointing to a char pointer too. So why two different approaches must be used to do the same task?

This code performs arithmetic on a `void *` type which isn't allowed in C. — davmac, Jun 12 '13 at 16:10
Well, first of all, your code is already invalid. `data + offset` expression is already not C, since C language does not support pointer arithmetic on `void *` pointers. You using non-standard compiler extensions provided by some specific compiler. — AnT stands with Russia, Jun 12 '13 at 16:20
@davmac: Yet many world's most brilliant C developers use it as a compiler extension as if the type was `char *`. :) — , Jun 12 '13 at 16:28
I'd argue that the better programmers just cast to `char *` before performing the arithmetic. Equivalent *and* standards compliant! :) — davmac, Jun 12 '13 at 16:30
@Vlad Lazarenko: No, a C developer would actually explicitly convert it to `char *` even if it were not required by the compiler. It is something C developers do automatically. I'm sure the brilliant developers you are referring to are indeed brilliant, but they are certainly not C developers, unless you adopt a very loose definition of the latter. — AnT stands with Russia, Jun 12 '13 at 16:37
@pinkpanther: "no casting is needed from void * to..." - what does it mean in the context of the question? — AnT stands with Russia, Jun 12 '13 at 16:47
@AndreyT: Oh, really? I am sorry. I must go tell Linux kernel developers that they have been doing it wrong for decades. And lots of other folks, too. — , Jun 12 '13 at 16:47
@Vlad Lazarenko: Er... Wrong? I never said anything about "wrong" with regards to them. It is you who are wrong. Firstly, Linux kernel developer do not work in C. Such low-level code as OS kernel cannot be possibly implemented in modern C. The language is too high-level for that. They work in some platform-specific C-like pseudo-language that just borrows most of C syntax and features, and piggybacks for that reason on some C compiler implementation. — AnT stands with Russia, Jun 12 '13 at 16:54
@AndreyT nothing...actually :( I've always used explicit casting for void * to other pointer types in my college...but in SO many people comment that that it's not required and shouldn't be used indeed. I couldn't believe... and now here you say that real C developer uses it..... ) I kind of confused now what to follow whom to follow.... for calrity's sake do you mean that a real c developer uses `char *c=(char *)malloc(1)`?...I would be really delighted if you can point me in right direction... — pinkpanther, Jun 12 '13 at 16:58
@AndreyT I'm actually very confused by your claim. I'm pretty sure the Linux kernel... is written in C. It has some platform specific assembly sure, but the majority of the code is C. Here's a discussion of this topic on SO: http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc — rliu, Jun 12 '13 at 17:18
@Vlad Lazarenko: I haven't really worked in Linux kernel development - let's put it that way :) But I work a lot in areas that occupy the same "strata" with regard to its complexity. "OS kernel development" ranks at pretty average complexity in the modern development areas (or even below that). — AnT stands with Russia, Jun 12 '13 at 18:04
@Vlad Lazarenko: The fascination of young developers with "OS kernel development" is like a starry-eyed child dreaming of becoming a fighter pilot, seeing this occupation as a pinnacle of human achievement. As the child grows up, he discovers that his former childhood affections and much more mundane that they used to appear, and that there are things that reach far beyond that. — AnT stands with Russia, Jun 12 '13 at 18:04
@AndreyT: What this has to do with pointer arithmetics on `void*`? Or do you think I am a child/young developer? I might take that part as a compliment! — , Jun 12 '13 at 18:11
@roliu: It is important to understand the amount of artistic license that go into such assertions. We all know that "Linux kernel is written in C", we all know what exactly is meant by that, and we all know to which degree it is actually true. In this particular discussion I'm intentionally using the term "C" in a formal and pedantic fashion. For a pedantic understanding of the term the claim that "Linux kernel is written in C" is patently false. But frankly, I don't understand why we are arguing about it here. Why did this issue even arise? — AnT stands with Russia, Jun 12 '13 at 18:12
@roliu: (I followed your link and I don't immediately see how it is relevant. Are you sure this is the right link?) — AnT stands with Russia, Jun 12 '13 at 18:13
@Vlad Lazarenko: "What this has to do with pointer arithmetic on `void*`?" is a question I would sure like to ask myself. I believe that that sudden rapid transition from the very narrow and specific matter of `void *` arithmetic to the issue of "brilliant developers" that apparently work in "Linux kernel development" took place in your comments specifically. I was quite surprised by it. So, when you ask *me* about what this has to do with pointer arithmetic, it makes me even more surprised. — AnT stands with Russia, Jun 12 '13 at 18:16
@roliu: One could have written a 70'-era OS kernel in the very nascent version of C. In fact, this is how C language came into existence and how Unix was written. But if you read about that nascent language, you'll see that its features were deliberately tailored to that specific task. It was indeed a "higher-level assembly" back then. As C developed in a much more universal and abstract application-independent language, many of those features disappeared or acquired a completely different meaning. C is no longer a "higher-level assembly". — AnT stands with Russia, Jun 12 '13 at 18:26
@roliu: And this is exactly why if you want to write an OS kernel in C, you will have to go far beyond the limits outlined by the definition of the language. And this is exactly what will make your code something quite different than C. Yet, if you claim that such code is C, people typically won't argue simply because they understand perfectly well what it *really* means. — AnT stands with Russia, Jun 12 '13 at 18:29
@AndreyT The misunderstanding stemmed from me reading pinkpanther's comment without reading the code. I thought you were saying that it was a `void *` because he didn't explicitly cast the return value of `malloc` (which is obviously wrong even ignoring `void *` promotion). My fault for reading the comments blindly. I am not sure what you are talking about with respect to the Linux kernel. Sure... C has changed. The Linux kernel is still mostly C. The claim "it's written in C" is vague and technically incorrect. But "Linux kernel developer do not work in C" is also wrong. — rliu, Jun 12 '13 at 19:10
Please take all discussions to Chat. If you have clarifying informatio n for the question or an answer, please edit the post to include that information. As it stands, these comments are drawing flags, which will inevitably lead to purging the discussion. — George Stocker, Jun 12 '13 at 19:57

score 3 · Answer 1 · answered Jun 12 '13 at 16:08

In both cases data+offset points to a member of the struct.

But lets look at the structure of the the struct. It consists of

+-----+--------------------------------------------+
| a   | sizeof(int) probably 4 or 8 bytes          |
+-----+--------------------------------------------+
| possible padding of unknown size (probably zero) |
+-----+--------------------------------------------+
| arr | 10 bytes                                   |
+-----+--------------------------------------------+
| possible padding of unknown size (maybe 2 bytes) |
+-----+--------------------------------------------+
| str | sizeof(char*) probably 4 or 8 bytes        |
+-----+--------------------------------------------+

and elsewhere in memory is a block of 100 byte allocated with malloc.

Notice that the data for test.arr is stored in the memory allocated for test, but the thing stored in test for test.str is the address of another block of memory.

score 1 · Answer 2 · edited Nov 25 '13 at 16:00

1

You have to give the compiler information about the type of pointer you have in functions f and g, you can't do point arithmetic on void pointers.

Cast data to a char pointer and the printf's will work

void f(void *data, int offset){
  char *s = (char *)(( char*)data + offset); 
  printf("%s", s);
}

void g(void *data, int offset){
  char *s = *(char **)((char*)data+offset);   
  printf("%s", s);
}

edited Nov 25 '13 at 16:00

BenMorel

34,448
50
182
322

answered Jun 12 '13 at 16:05

1

You are correct about the necessity to cast to `char *` in formal C language. However, the OP is probably using a heavily extended compiler (like GCC), which supports pointer arithmetic on `void *` pointers. His code is already working, he has no problems with that. – AnT stands with Russia Jun 12 '13 at 16:50

AnT stands with Russia · Answer 3 · 2013-06-12T16:47:42.497

In function f, what is (data + offset) actually pointing at? Is it not pointing to arr which is a char pointer to the first element of the string?

This is your primary source of confusion. arr is not a pointer, it is an array. Arrays are not pointers. Meanwhile, str is a pointer. "Arrays" and "pointers" are two absolutely different things. They have virtually nothing in common. And this is exactly why you have to work with them differently.

Arrays and pointers can behave very similarly in so called value contexts (i.e. when used as rvalues), but this is purely superficial similarity. They immediately reveal their major differences in so called object contexts (i.e. when used as lvalues). In your specific example your member-accessing code is an example of object context, which is why you have to carefully observe the difference between arr and str.

The matter of differences between arrays and pointers has been covered many times already. I don't see any reason to repeat it here. Just do a search on "array pointer difference" on SO. Or read the essential FAQ

http://c-faq.com/aryptr/index.html

P.S. Also note that C language does not support pointer arithmetic on void * pointers. All of your your (data + offset) expressions are invalid. What you wanted to do is really ((char *) data + offset). The conversion of data from void * to char * is required in order to be able to perform point arithmetic with byte-offsets.

score 0 · Answer 4 · answered Jun 12 '13 at 16:16

In your struct, at (presumably) offset 4, is a list of 10 bytes one after each other, containing the letters that make 'Hello'; So (data+4) is a pointer at char and needs to be dcoded accordingly (i.e. char *).

However, after those 10 bytes, come a few bytes that make the address of a buffer somewhere, i.e. those bytes are a 'char *' (you defined them so), so data+offset there is a pointer at a char-pointer or a char **.

What is probably confusing is that both

strcpy(test.arr, "Hello ");
strcpy(test.str, "World!");

work.

This is a confusing (but useful feature of C/C++). The name of an array, when used in a place that requires a pointer to the type of the array element, will be treated by the compiler as if it were a pointer to the first element of the array.

So test.str is clearly a pointer to char (because you defined it that way). test.arr can be used as a pointer to the first element of test, if the situation suggests it.

When you write strcpy(test.arr, "Hello "); what the compiler assumes you mean is strcpy(&test.arr[0], "Hello ");

score 0 · Answer 5 · answered Jun 12 '13 at 16:19

In function f, what is (data + offset) actually pointing at?

It is pointing at the 'arr' member of the structure object (if you assume GCC's semantics for the void pointer arithmetic).

Is it not pointing to arr which is a char pointer to the first element of the string?

arr is a char array, not a char pointer.

But in function g, (data + offset) is pointing to a char pointer too.

In this case, it is pointing to a char pointer, yes.

So why two different approaches must be used to do the same task?

The pointers are to two different things - one to a char pointer (which itself points at a char array) and the other to a char array. There is one more level of indirection in the first case.

Confusion about static and dynamic arrays in C

5 Answers5