76

I'm not trying to replicate the usual question about C not being able to return arrays but to dig a bit more deeply into it.

We cannot do this:

char f(void)[8] {
    char ret;
    // ...fill...
    return ret;
}

int main(int argc, char ** argv) {
    char obj_a[10];
    obj_a = f();
}

But we can do:

struct s { char arr[10]; };

struct s f(void) {
    struct s ret;
    // ...fill...
    return ret;
}

int main(int argc, char ** argv) {
    struct s obj_a;
    obj_a = f();
}

So, I was skimming the ASM code generated by gcc -S and seems to be working with the stack, addressing -x(%rbp) as with any other C function return.

What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.

Extra data: I am using Linux and gcc on a x64 Intel.

Dario Rodriguez
  • 802
  • 6
  • 15
  • 19
    Once upon a very long time ago, around the time that the first edition of K&R was published (1978), you couldn't return structures from functions or pass them to functions; you had to use pointers. You still can't assign plain arrays, though you can assign structures containing arrays. It is mostly the way C is designed and no-one has fought to get the necessary changes standardized. – Jonathan Leffler Jun 12 '18 at 03:29
  • 18
    Indeed, section 6.2 of K&R 1st Edn says: _There are a number of restrictions on C structures. The essential rules are that the only operations that you can perform on a structure are take its address with `&` and access one of its members. This implies that structures may not be assigned to or copied as a unit, and that they can not be passed to or returned from functions. (These restrictions will be removed in forth-coming versions.)_ … And, mercifully, the were removed. The restriction on array assignment has not changed yet, though. – Jonathan Leffler Jun 12 '18 at 03:33
  • 26
    Note that `char f[8](void)` is an array of functions. A function returning an array looks like `char f(void)[8]`. – melpomene Jun 12 '18 at 03:35
  • 4
    If you increase the size of your structure to more than 16 bytes, you will notice that the compiler actually passes an additional hidden argument which is the address of the struct to return. – caf Jun 12 '18 at 04:16
  • 2
    Possible duplicate of [C's aversion to arrays](https://stackoverflow.com/questions/35597019/cs-aversion-to-arrays) – Bo Persson Jun 12 '18 at 09:01
  • 1
    @BoPersson that question looks similar, but this one much more clear and precise, and has attracted clear answers, so I would suggest keeping this one open, and linking the close one to here. – dcorking Jun 12 '18 at 09:41
  • Another old question, with answers exploring other aspects of the issue, is [Why can't arrays of same type and size be assigned?](https://stackoverflow.com/questions/14826952/). – Steve Summit Jun 12 '18 at 10:09
  • @caf I would like to see what do you mean with that hidden argument. As far as I saw (now I cannot access my machine so I cannot test), it passes an adress inside the stack frame created for the function. Now... if in any case it creates a copy of the array, then it is even more expensive than actually returning the array. Pretty bad solution for low-memory machines in the early 70's. – Dario Rodriguez Jun 12 '18 at 15:29
  • @DarioRodriguez For a bit more information, see the old [C FAQ list](http://c-faq.com/), questions [2.7](http://c-faq.com/struct/firstclass.html) and [2.9](http://c-faq.com/struct/passret.html). – Steve Summit Jun 12 '18 at 22:26
  • I did, but the ASM I find seems not to be accordingly. See, something like the 2nd sample code but: `int main (void) { struct s m; m.arr[0]=3; m = f(); if (m.arr[0]==3) return 1; return 0; }` Then the ASM goes like: `mov BYTE PTR [rbp-512], 3 lea rax, [rbp-512] mov rdi, rax call f movzx eax, BYTE PTR [rbp-512] cmp al, 3` – Dario Rodriguez Jun 12 '18 at 23:06
  • @DarioRodriguez: The `lea rax, [rbp-512]` is loading the address of the struct `m` into `%rax`, and the `mov rdi, rax` is then moving that address into `%rdi`, which is where the first scalar function argument is passed on x86-64. – caf Jun 13 '18 at 01:16
  • So it is actually duplicating the struct. In that case it would have been a better option to be unable to return structs. This is in many ways a nasty thing about writing C (gcc/linux/x64). I would rather stick to pointers. – Dario Rodriguez Jun 13 '18 at 14:14
  • @DarioRodriguez: No - in this case the code produced by the compiler is the same as if you'd passed a pointer to `m` (eg. `f(&m);` instead of `m = f()`, obviously with the corresponding change to `f()`). – caf Jun 16 '18 at 14:49
  • Yes, now I checked it all, thank you! – Dario Rodriguez Jun 16 '18 at 21:10

5 Answers5

114

First of all, yes, you can encapsulate an array in a structure, and then do anything you want with that structure (assign it, return it from a function, etc.).

Second of all, as you've discovered, the compiler has little difficulty emitting code to return (or assign) structures. So that's not the reason you can't return arrays, either.

The fundamental reason you cannot do this is that, bluntly stated, arrays are second-class data structures in C. All other data structures are first-class. What are the definitions of "first-class" and "second-class" in this sense? Simply that second-class types cannot be assigned.

(Your next question might be, "Other than arrays, are there any other second-class data types?", and I think the answer is "Not really, unless you count functions".)

Intimately tied up with the fact that you can't return (or assign) arrays is that there are no values of array type, either. There are objects (variables) of array type, but whenever you try to take the value of one, you immediately get a pointer to the array's first element. [Footnote: more formally, there are no rvalues of array type, although an object of array type can be thought of as an lvalue, albeit a non-assignable one.]

So, quite aside from the fact that you can't assign to an array, you can't even generate a value to try to assign. If you say

char a[10], b[10];
a = b;

it's as if you had written

a = &b[0];

So we've got an array on the left, but a pointer on the right, and we'd have a massive type mismatch even if arrays somehow were assignable. Similarly (from your example) if we try to write

a = f();

and somewhere inside the definition of function f() we have

char ret[10];
/* ... fill ... */
return ret;

it's as if that last line said

return &ret[0];

and, again, we have no array value to return and assign to a, merely a pointer.

(In the function call example, we've also got the very significant issue that ret is a local array, perilous to try to return in C. More on this point later.)

Now, part of your question is probably "Why is it this way?", and also "If you can't assign arrays, why can you assign structures containing arrays?"

What follows is my interpretation and my opinion, but it's consistent with what Dennis Ritchie describes in his paper The Development of the C Language.

The non-assignability of arrays arises from three facts:

  1. C is intended to be syntactically and semantically close to the machine hardware. An elementary operation in C should compile down to one or a handful of machine instructions taking one or a handful of processor cycles.

  2. Arrays have always been special, especially in the way they relate to pointers; this special relationship evolved from and was heavily influenced by the treatment of arrays in C's predecessor language B.

  3. Structures weren't initially in C.

Due to point 2, it's impossible to assign arrays, and due to point 1, it shouldn't be possible anyway, because a single assignment operator = shouldn't expand to code that might take N thousand cycles to copy an N thousand element array.

And then we get to point 3, which really ends up leading to a contradiction.

When C got structures, they initially weren't fully first-class either, in that you couldn't assign or return them. But the reason you couldn't was simply that the first compiler wasn't smart enough, at first, to generate the code. There was no syntactic or semantic roadblock, as there was for arrays.

And the goal all along was for structures to be first-class, and this was achieved relatively early on. The compiler caught up, and learned how to emit code to assign and return structures, shortly around the time that the first edition of K&R was going to print.

But the question remains, if an elementary operation is supposed to compile down to a small number of instructions and cycles, why doesn't that argument disallow structure assignment? And the answer is, yes, it's a contradiction.

I believe (though this is more speculation on my part) that the thinking was something like this: "First-class types are good, second-class types are unfortunate. We're stuck with second-class status for arrays, but we can do better with structs. The no-expensive-code rule isn't really a rule, it's more of a guideline. Arrays will often be large, but structs will usually be small, tens or hundreds of bytes, so assigning them won't usually be too expensive."

So a consistent application of the no-expensive-code rule fell by the wayside. C has never been perfectly regular or consistent, anyway. (Nor, for that matter, are the vast majority of successful languages, human as well as artificial.)

With all of this said, it may be worth asking, "What if C did support assigning and returning arrays? How might that work?" And the answer will have to involve some way of turning off the default behavior of arrays in expressions, namely that they tend to turn into pointers to their first element.

Sometime back in the '90's, IIRC, there was a fairly well-thought-out proposal to do exactly this. I think it involved enclosing an array expression in [ ] or [[ ]] or something. Today I can't seem to find any mention of that proposal (though I'd be grateful if someone can provide a reference). At any rate, I believe we could extend C to allow array assignment by taking the following three steps:

  1. Remove the prohibition of using an array on the left-hand side of an assignment operator.

  2. Remove the prohibition of declaring array-valued functions. Going back to the original question, make char f(void)[8] { ... } legal.

  3. (This is the biggie.) Have a way of mentioning an array in an expression and ending up with a true, assignable value (an rvalue) of array type. For the sake of argument I'll posit a new operator or pseudofunction called arrayval( ... ).

[Side note: Today we have a "key definition" of array/pointer correspondence, namely that:

A reference to an object of array type which appears in an expression decays (with three exceptions) into a pointer to its first element.

The three exceptions are when the array is the operand of a sizeof operator, or a & operator, or is a string literal initializer for a character array. Under the hypothetical modifications I'm discussing here, there would be a fourth exception, namely when the array was an operand of this new arrayval operator.]

Anyway, with these modifications in place, we could write things like

char a[8], b[8] = "Hello";
a = arrayval(b);

(Obviously we would also have to decide what to do if a and b were not the same size.)

Given the function prototype

char f(void)[8];

we could also do

a = f();

Let's look at f's hypothetical definition. We might have something like

char f(void)[8] {
    char ret[8];
    /* ... fill ... */
    return arrayval(ret);
}

Note that (with the exception of the hypothetical new arrayval() operator) this is just about what Dario Rodriguez originally posted. Also note that — in the hypothetical world where array assignment was legal, and something like arrayval() existed — this would actually work! In particular, it would not suffer the problem of returning a soon-to-be-invalid pointer to the local array ret. It would return a copy of the array, so there would be no problem at all — it would be just about perfectly analogous to the obviously-legal

int g(void) {
    int ret;
    /* ... compute ... */
    return ret;
}

Finally, returning to the side question of "Are there any other second-class types?", I think it's more than a coincidence that functions, like arrays, automatically have their address taken when they are not being used as themselves (that is, as functions or arrays), and that there are similarly no rvalues of function type. But this is mostly an idle musing, because I don't think I've ever heard functions referred to as "second-class" types in C. (Perhaps they have, and I've forgotten.)


Footnote: Because the compiler is willing to assign structures, and typically knows how to emit efficient code for doing so, it used to be a somewhat popular trick to co-opt the compiler's struct-copying machinery in order to copy arbitrary bytes from point a to point b. In particular, you could write this somewhat strange-looking macro:

#define MEMCPY(b, a, n) (*(struct foo { char x[n]; } *)(b) = \
                         *(struct foo *)(a))

that behaved more or less exactly like an optimized in-line version of memcpy(). (And in fact, this trick still compiles and works under modern compilers today.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • I don't like or agree with your characterization that "there are no values of array type", nor do I think it bears scrutiny under the standard. For the most part, you cannot directly *handle* array values, but that's not quite the same thing. – John Bollinger Jun 12 '18 at 03:39
  • 4
    @JohnBollinger there are lvalues of array type but no rvalues. – n. m. could be an AI Jun 12 '18 at 03:51
  • 2
    @JohnBollinger the C standard uses the term "value" to mean what might be called "rvalue" – M.M Jun 12 '18 at 04:00
  • 2
    @n.m.: Even that's not true - with the second example in the question, `f().arr` is an rvalue array. – caf Jun 12 '18 at 04:02
  • 5
    +1 for effectively explaining why there is no way to expand the language with this capability without conflicting with existing constraints. – R.. GitHub STOP HELPING ICE Jun 12 '18 at 04:07
  • 2
    @JohnBollinger It is indeed a bit of a simplification, but one I find useful. What I meant more specifically, as n.m. and M.M (are you two related? :-) ) pointed out and I have now clarified, is that there are no rvalues of array type. (And of course it's a rather famously harder question to say whether arrys can be lvalues.) – Steve Summit Jun 12 '18 at 04:07
  • Well, @M.M and n.m., although the standard sometimes uses "value" in that sense, in fact it *defines* the term more broadly as "precise meaning of the contents of an object when interpreted as having a specific type" ([C2011, 3.19/1](http://port70.net/~nsz/c/c11/n1570.html#3.19p1)). In this sense -- which I myself consider foundational -- there are values of *every* type that an object can have, and objects certainly can have array types. – John Bollinger Jun 12 '18 at 04:49
  • @hmm you are right, you can't take its address though it decays to a pointer in rvalue contexts. – n. m. could be an AI Jun 12 '18 at 05:29
  • @JohnBollinger that is describing rvalues. The *contents* of an array cannot be interpreted as having array type – M.M Jun 12 '18 at 09:02
  • I think this answer is as clear as it can be, even when you assumed that there is a side question (wich is not true for me but is ok). The real side question would have been (perhaps): Is it wise to rely on calling malloc and memcpy (or passing pointers) instead of returning a stack buffer? – Dario Rodriguez Jun 12 '18 at 18:09
  • 1
    @DarioRodriguez I should have touched on that point. Relying on `malloc` and `memcpy` is generally fine (as long as it's well-documented, and reasonable for your caller, to receive a pointer to malloc'ed memory that it's then someone else's responsibility to free). And doing *something* other than returning a pointer to a stack buffer is *mandatory*, because returning a pointer to a stack buffer is just about guaranteed not to work. – Steve Summit Jun 12 '18 at 19:59
  • One of the best explanation I have ever come across! – aaroh Jun 13 '18 at 06:51
  • @SteveSummit What a wonderful answer ! Thank you so much. – Ankur Agarwal Jul 04 '20 at 01:54
  • Actually, supporting first class arrays just requires changing the automatic conversion of arrays to pointers in all rvalue contexts to an implicit conversion that only happens when needed (similar to the implicit conversion of `int`->`double` in things like `3/2.0`). No need for any "special" explicit construct to say when it happens. – Chris Dodd Jan 12 '23 at 22:51
23

What is it with returning arrays directly? I mean, not in terms of optimization or computational complexity but in terms of the actual capability of doing so without the struct layer.

It has nothing to do with capability per se. Other languages do provide the ability to return arrays, and you already know that in C you can return a struct with an array member. On the other hand, yet other languages have the same limitation that C does, and even more so. Java, for instance, cannot return arrays, nor indeed objects of any type, from methods. It can return only primitives and references to objects.

No, it is simply a question of language design. As with most other things to do with arrays, the design points here revolve around C's provision that expressions of array type are automatically converted to pointers in almost all contexts. The value provided in a return statement is no exception, so C has no way of even expressing the return of an array itself. A different choice could have been made, but it simply wasn't.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 3
    Having raised a comparison with Java, I note for the record that what Java means by the term "object" differs from what C means by that term. – John Bollinger Jun 12 '18 at 04:21
  • 2
    _"expressions of array type are automatically converted to pointers in almost all contexts."_ That is the real answer, IMHO. From a compiler POV, it'd be _hard_ to figure out whether you are referring to an array as a whole or just as a pointer to the first element: there ought have been something like `return (array) b` or `return b` to explicitly distinguish. And clearly, `struct`s cannot have this ambiguity. – edmz Jun 12 '18 at 08:55
  • "A different choice could have been made, but it simply wasn't." - To me, this is it. – Dario Rodriguez Jun 12 '18 at 22:52
5

For arrays to be first-class objects, you would expect at least to be able to assign them. But that requires knowledge of the size, and the C type system is not powerful enough to attach sizes to any types. C++ could do it, but doesn't due to legacy concerns—it has references to arrays of particular size (typedef char (&some_chars)[32]), but plain arrays are still implicitly converted to pointers as in C. C++ has std::array instead, which is basically the aforementioned array-within-struct plus some syntactic sugar.

Roman Odaisky
  • 2,811
  • 22
  • 26
  • 3
    If the C type system were not powerful enough to attach sizes to types, then how could the `sizeof` operator work? But it *does* work, including with array types. – John Bollinger Jun 12 '18 at 13:35
  • @JohnBollinger sizeof is calculated at compile time, while the array size is known by the compiler. – Simon B Jun 12 '18 at 15:48
  • Yes, @SimonB (except for VLAs). And? – John Bollinger Jun 12 '18 at 18:16
  • 1
    As C is a compiled language, compile time is the only time in wich types are really an issue of any kind. There will be no run-time type system. Then, C type system is indeed attaching sizes to types. – Dario Rodriguez Jun 12 '18 at 19:14
  • @JohnBollinger Even sizeof only _kinda_ works. Try `size_t f(char x[42]) { return sizeof(x); }`. The C type system drops size information at earliest opportunity. – Roman Odaisky Jun 12 '18 at 19:19
  • 1
    `sizeof` works fine, @RomanOdaisky. It is the meaning of that function's parameter list that apparently you find surprising. Per the standard, parameter `x` is declared as a `char *`, not an array, bracketed size notwithstanding. That is wholly consistent with the fact that there is no way to pass an array as a function argument in the first place (because in the argument list for a function call, as in most other places, expressions having array type are converted to pointers). – John Bollinger Jun 12 '18 at 20:23
  • That’s exactly what I was saying. The decay of char[] into char* is, to my mind, very much a part of the type system, and the natural consequence of this is inability to pass arrays around. – Roman Odaisky Jun 12 '18 at 23:55
  • I don’t understand this - if the size information is lost how can pointer arithmetic work? It *must* preserve the size of an array element. – Gaius Jun 13 '18 at 10:03
  • @Gaius pointer arithmetic works by ignoring any size implication. YOU as the programmer have to make sure you're in bounds, or C will happily run off way beyond the length of the array. If the array has decayed to a pointer, you can't even retrieve that information, if it wasn't passed as an extra parameter. Unless you mean the size of underlying elements of the array, which are simply known to the C compiler - `sizeof(double)` always works. – GeckoGeorge Jun 13 '18 at 13:27
  • If I increment a pointer to/in an array I get to the next element in the array, no? Or as you say off the end. But it still knows the size of each element of the array. – Gaius Jun 13 '18 at 13:42
  • 1
    Yes, @Gaius, you are correct. You have presented another counterexample to Roman's assertion that "the C type system is not powerful enough to attach sizes to any types." And in fact, your counterexample covers even sizes of array types when you have pointers to arrays (example: `int (*p)[3]; printf("%uz\n", sizeof(*p));`). – John Bollinger Jun 13 '18 at 21:53
0

Bounty hunting.

The authors of C did not aspire to be language or type system designers. They were tool designers. C was a tool to make system programming easier. ref: B Kernighan on Pascal Ritchie on C

There was no compelling case for C to do anything unexpected; especially as UNIX and C were ushering in the era of least surprise. Copying arrays, and making complex syntax to do so when it was the metaphorical equivalent of having a setting to burn the toast did not fit the C model.

Everything in C, the language, is effectively constant time, constant size. C, the standard, seems preoccupied with doing away with this core feature which made C so popular; so expect the, uh, standard C/2023.feb07 to feature a punctuation nightmare that enables arrays as r-values.

The decision of the C authors makes eminent sense if you view the programming world pragmatically. If you view it as a pulpit for treasured beliefs, then get onboard for C/2023.feb07 before C/2023.feb08 nullifies it.

mevets
  • 10,070
  • 1
  • 21
  • 33
-2

I'm afraid in my mind it's not so much a debate of first or second class objects, it's a religious discussion of good practice and applicable practice for deep embedded applications.

Returning a structure either means a root structure being changed by stealth in the depths of the call sequence, or a duplication of data and the passing of large chunks of duplicated data. The main applications of C are still largely concentrated around the deep embedded applications. In these domains you have small processors that don't need to be passing large blocks of data. You also have engineering practice that necessitates the need to be able to operate without dynamic RAM allocation, and with minimal stack and often no heap. It could be argued the return of the structure is the same as modification via pointer, but abstracted in syntax... I'm afraid I'd argue that's not in the C philosophy of "what you see is what you get" in the same way a pointer to a type is.

Personally, I would argue you have found a loop hole, whether standard approved or not. C is designed in such a way that allocation is explicit. You pass as a matter of good practice address bus sized objects, normally in an aspirational one cycle, referring to memory that has been allocated explicitly at a controlled time within the developers ken. This makes sense in terms of code efficiency, cycle efficiency, and offers the most control and clarity of purpose. I'm afraid, in code inspection I'd throw out a function returning a structure as bad practice. C does not enforce many rules, it's a language for professional engineers in many ways as it relies upon the user enforcing their own discipline. Just because you can, doesn't mean you should... It does offer some pretty bullet proof ways to handle data of very complex size and type utilising compile time rigour and minimising the dynamic variations of footprint and at runtime.

Rob
  • 141
  • 5
  • 3
    I think this is a long way out of the topic. It does seem rather an opinion than an answer. Also it twists the main topic, because even if most of the C applications were for "deep embedded applications", that doesn't turn the discussion into one of "good practice and applicable practice for deep embedded applications". The discussion is indeed oriented in a very different way. – Dario Rodriguez Jun 12 '18 at 21:35
  • I'll see your "C is still largely concentrated around the deep embedded applications... without dynamic RAM allocation, and with minimal stack and often no heap", and raise you "C is a general-purpose computer programming language used for operating systems, libraries, games and other high performance work" -- as quoted from [SO's own tag info](https://stackoverflow.com/tags/c/info). – Steve Summit Jun 17 '18 at 10:10
  • You're absolutely right that a struct-returning function might be forcefully denied -- and rightfully so -- by a code review or style guide governing tightly constrained, embedded work. (Style guides are always disallowing things for various reasons.) – Steve Summit Jun 17 '18 at 10:18
  • As I tried to explore in [my answer](https://stackoverflow.com/questions/50808782/what-does-impossibility-to-return-arrays-actually-mean-in-c/50808867#50808867), struct assignment and array assignment ended up falling on opposite sides of the legality line, due to inconsistent application of the conflicting goals of not having "expensive" code, versus having a clean, consistent, expressive language. – Steve Summit Jun 17 '18 at 10:20