Can C arrays be expected to have the same performance as structs?

Question

Case A:

typedef struct {
    int x;
    int y;
    int z;
} v3;

v3 foo = {1,2,3};
/* code using v3 and foo */

Case B:

int foo[3] = {1,2,3};
/* code using arrays and foo */

Can A and B be considered equivalent, performance-wise, in general? What about arrays with statically unknown size, such as int foo[n]? Is there a disadvantage, then?

Does this help:- http://stackoverflow.com/questions/4560142/what-is-faster-in-c-structs-or-arrays — Rahul Tripathi, Nov 23 '13 at 19:17
you simply need to look at the compiler output to understand the answer to this question. No need to google. And what this has to do with performance is also a mystery. Are you interested in compile time performance? What measurements have you made thus far? — old_timer, Nov 23 '13 at 19:17
@Viclib Yes, we do. Start with the Wikipedia article on Neumann architecture, for example. — , Nov 23 '13 at 19:18
"You simply need to look at the compiler output" what if I don't understand assembly? You are pretty much arguing my question is invalid because I don't have enough knowledge to infer its answer myself, and that I should work to acquire that knowledge and figure it on my own just to solve this very particular problem. Which is exactly the situation this site is meant to solve. — MaiaVictor, Nov 23 '13 at 19:22
@Viclib, "this site" is meant to solve specific programming questions, not teach you how computers work. — Paul Tomblin, Nov 23 '13 at 19:23
Read again and you should notice I'm nowhere asking how computers work - wonder where you read that. I'm asking what is the runtime performance expectations for a particular situation in certain programming language. I guess I'm experienced enough on SO to say, in my opinion, this kind of question is perfectly valid and in-topic. Please consider explaining why you think otherwise. — MaiaVictor, Nov 23 '13 at 19:25
@Viclib: For the record, I think the answer to your question is "no", with two caveats: (1) I don't have an example handy (so I'm not posting an answer yet), and (2) your code has to be very high-performance for you to notice it. — user541686, Nov 23 '13 at 19:28
@PaulTomblin: "This site" is popular because of questions and answers [like this](http://stackoverflow.com/questions/11227809), that teach you how computers work. — user541686, Nov 23 '13 at 19:29
@Viclib you don't specifically ask how computers work, you just ask an incredibly vague question that the only way to answer it would be to explain how computers work. Your guess as to what is on-topic is at odds with the FAQ, and the consensus of those of us who've been here since the beginning. — Paul Tomblin, Nov 23 '13 at 19:30
@Mehrdad, you see what's different between the question you link and this one? I'd give you a hint: "code". — Paul Tomblin, Nov 23 '13 at 19:31
@PaulTomblin: Sure, he could've provided code and that would've been better. But that couldn't possibly mean the difference between this question being on-topic and off-topic... providing more code doesn't change the topic!! — user541686, Nov 23 '13 at 19:32
@Mehrdad, it changes it completely. It changes it from "what happens in this situation" to "how do compilers work"? — Paul Tomblin, Nov 23 '13 at 19:35
I don't think it is a consensus considering the votes are split. I also believe this argument right here makes it unlikely the question itself is as straighforward as some say. I, too, don't think it is such a broad question to answer. It is up to an answerer to decide how deep in details he will go on his explanation, but the answer itself is a simple "yes" or "no". — MaiaVictor, Nov 23 '13 at 19:35
@Viclib: I think you're right but some people really have a deep passion for closing questions on this site... — user541686, Nov 23 '13 at 19:38
@PaulTomblin: You're not making any sense but I'm not going to argue... — user541686, Nov 23 '13 at 19:38
By the way, in the end, the answers were great, my issue was solved and it all added up to the community's resources. Your prediction was wrong this time, but that happens. Regardless, I'd like to thanks all participants. — MaiaVictor, Nov 23 '13 at 19:50
@LtWorf: You should read a bit of the link I posted in my answer. — user541686, Nov 23 '13 at 20:07

Sergey Kalinichenko · Answer 1 · 2013-11-23T19:49:17.107

3

The answer depends on the usage of your array or struct. If you use your array with compile-time constant expressions as indexes, there would be no difference: the compiler would be able to compute the required offset at compile time. This is the only situation where the comparison is apples-to-apples, because there is no way to "index" a struct to get a particular field. Accesses

foo.y = 123;

and

foo[1] = 123;

would likely translate to an identical set of machine instructions.

When the offset is not a compile-time expression, however, there is an additional step required to compute the address in memory: in order to execute the code below

i = 1; // Assume that i is known only at runtime
foo[i] = 123;

CPU would need to add a variable offset from variable i to the address of foo before storing 123 at that address.

Theoretically, this step introduces an additional overhead. However, modern CPUs compute the offset in hardware, so the overhead is negligible.

edited Nov 23 '13 at 19:49

answered Nov 23 '13 at 19:23

Sergey Kalinichenko

714,442
84
1,110
1,523

I'd guess even the last case would probably be optimized by the compiler into the same instructions - but possibly you're right in a similar situation, such as when "i" itself is only known on runtime. – MaiaVictor Nov 23 '13 at 19:47
@Viclib You are right, if the compiler knows that `i` is one at compile time, it will optimize the code into the same set of instructions. – Sergey Kalinichenko Nov 23 '13 at 19:51
Interesting. Great answer, thanks! (Same for the answers below) – MaiaVictor Nov 23 '13 at 19:52

AnT stands with Russia · Answer 2 · 2013-11-23T19:52:42.947

There's no performance difference between a struct object and a compile-time sized array object, as long as array access is performed through a compile-time index. v3.x and foo[0] will typically generate the same code, as well as v2.y and foo[1] and so on.

The array has an added benefit of supporting run-time element selection, i.e. access through foo[i] where i is a run-time index. This might be slower, but this is not relevant in the context of the question since struct types do not support run-time member selection at all.

A run-time sized array (VLA) will generally be slower than compile-time sized array, since VLA is internally implemented as a pointer instead of an immediate array object, although the difference will not be detectable in overwhelming majority of cases. However, I don't understand why you are asking about VLAs here. C language does not have run-time sized structs, which is why VLAs have no meaningful relevance to the primary question.

score 2 · Answer 3 · answered Nov 23 '13 at 19:57

Usually, yes. However, when you're writing very performance-sensitive code, using scalars instead of structs can help the optimizer perform better

For example, if you have:

// atom forces
typedef struct float3 {
   float x;
   float y;
   float z;
} float3;

the code here on Intel's site suggests changing:

// position is an array of float3
float3* position; 
for (int k=0; k<dis; k++){

    jpos = position[ neighList[j*dis + k + maxNeighbors * i] ];

    float delx = ipos.x - jpos.x;
    float dely = ipos.y - jpos.y;
    float delz = ipos.z - jpos.z;
    float r2inv = delx*delx + dely*dely + delz*delz;

to

for (int k=0;k<dis;k++){ 

     jposx = position[ neighList[j*dis + k + maxNeighbors * i] ].x;
     jposy = position[ neighList[j*dis + k + maxNeighbors * i] ].y;
     jposz = position[ neighList[j*dis + k + maxNeighbors * i] ].z;

     float delx = iposx - jposx;
     float dely = iposy - jposy;
     float delz = iposz - jposz;
     float r2inv = delx*delx + dely*dely + delz*delz;

... because it helps the compiler efficiently vectorize the memory accesses:

To help the compiler generate better vector code, sometimes it helps to decompose complex data structures to allow the compiler to understand the available parallelism and vectorize the code.

So yes, scalars can be faster because the optimizer can optimize them better. However, note that you should only worry about this if this is a bottleneck in your code, and you should actually test it to see if it's the case in your situation; this is not meant to be a "rule of thumb" of any sort.

This clears up what you stated on the comments. Interesting how some simple changes can make such a huge difference in hot spots. Thanks. — MaiaVictor, Nov 23 '13 at 20:00
@Viclib: Yup. I hope everyone who thought the answer was an obvious "yes" reads the link, because there are way too many people here who think the obvious answer (based on the C standard and common sense) is also the correct answer (based on what happens in reality). — user541686, Nov 23 '13 at 20:04

ArtemB · Answer 4 · 2013-11-25T20:53:14.063

0

In general, yes, all three ints will end up back-to-back in memory and should result in about the same code generated by compiler when you access them.

You could even have array within the structure and things would still end up being about the same.

typedef struct {
   int xyz[3];
} v3;

v3 foo = {{1,2,3}};

int foo[n] is supported in C99 and is a gcc extension in C++ and older modes. It would allocate space on stack. Performance-wise there should be no difference, but you should consider explicitly allocating memory for dynamically sized arrays.

edited Nov 25 '13 at 20:53

answered Nov 23 '13 at 19:18

ArtemB

3,496
17
18

4

The VLA is not gcc extension, it is standard C (C99 and C11). – hyde Nov 23 '13 at 19:22
Indeed. Still, GCC docs says `"... are allowed in ISO C99, and as an extension GCC accepts them in C90 mode and in C++"` – ArtemB Nov 25 '13 at 20:49

Can C arrays be expected to have the same performance as structs?

4 Answers4