4

This is mainly a followup to Should definition and declaration match?

Question

Is it legal in C to have (for example) int a[10]; in one compilation unit and extern int a[4]; in another one ?

(You can find a working example in my answer to ref'd question)


Disclaimers :

  • I know it is dangerous and would not do it in production code
  • I know that if you have both in same compilation unit (typically through inclusion of a .h in the file containing the definition) compilers detects an error
  • I have already read Jonathan Leffler' excellent answer to How do I use extern to share variables between source files? but could not find the answer to this specific point there - even if Jonathan showed even worse usages ...

Even if different comments in referenced post spotted that as UB, I could not find any authoritative reference for it. So I would say that there is no UB here and that second compilation unit will have access to the beginning of the array, but I would really like a confirmation - or instead a reference about why it is UB

Community
  • 1
  • 1
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • I just spend ages trying to work out what UB actually means! Blame it on being a Friday afternoon at work! This didn't help - http://www.abbreviations.com/UB ... From my understanding - using your example, as long as the types are the same, then it should access the first four elements of the array - no undefined behavior. How-ever I wonder if the time spent pondering this - could of been better spent doing something, lets say, more constructive. Just my opinion Serge. :-) – Neil Jul 17 '15 at 13:28
  • Typically when you say `extern int a[N]` the `N` is ignored, since it's information that the compiler doesn't need and has no use for. Not sure what a compiler that tried to do array bounds checking might do with it, however. – Steve Summit Jul 17 '15 at 13:34
  • @SteveSummit : I intentionally put a bigger size in definition than in extern declaration. So if compiler tries to do bound checking, it should only limit to a size smaller than the allocated one so it should be harmless – Serge Ballesta Jul 17 '15 at 14:16

2 Answers2

6

It is undefined behavior.

Section 6.2.7.2 of C99 states:

All declarations that refer to the same object or function shall have compatible type; otherwise, the behavior is undefined.

NOTE: As mentioned in the comments below, the important part here is [...] that refer to the same object [...], which is further defined in 6.2.2:

In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function.

About the type compatibility rules for array types, section 6.7.5.2.4 of C99 clarifies what it means for two array types to be compatible:

For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value. If the two array types are used in a context which requires them to be compatible, it is undefined behavior if the two size specifiers evaluate to unequal values.

(Emphasis mine)

In the real world, as long as you stick to 1D arrays, it is probably harmless, because there is no bounds checking and the address of the first element remains the same regardless of the size specifier, but note that the sizeof operator will return different values in each source file (opening a wonderful opportunity to write buggy code).

Things start to get really ugly if you decide to extrapolate on this example and declare multidimensional arrays with different dimension sizes, because the offset of each element in the array will not match with the real dimensions any more.

Filipe Gonçalves
  • 20,783
  • 6
  • 53
  • 70
  • Your quote refers to array _types_. The example is about arrays, not types. – Paul Ogilvie Jul 17 '15 at 13:47
  • 1
    @PaulOgilvie Huh? The quote makes it very clear that declarations like `int a[10];` and `int b[4];` yield **incompatible** types. The type of `a` is *array of 10 `int`*. The type of `b` is *array of 4 `int`*. Both of them are array types, but they are incompatible. – Filipe Gonçalves Jul 17 '15 at 13:52
  • 6.2.7.2 seems relevant and would be with no doubt if it was in same translation unit. My problem is that there is an explicit point about different translation unit **but only for structures, unions of enumerated types** : *Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy*. And all examples in 6.7.5.2 and for a single translation unit. – Serge Ballesta Jul 17 '15 at 14:13
  • Hmm, 6.2.7.2 of standard made me think of an [article](http://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201503-asplos2015-cheri-cmachine.pdf) cited by Hans Passant in another question. I'm still waiting for other comments or answers but I think you are right : it is formal UB but is perfectly defined for most (if not all) current compilers and linkers. – Serge Ballesta Jul 17 '15 at 14:41
  • 1
    @SergeBallesta It's a very interesting question, and I'm also waiting for some comments on this. I think 6.2.7.2 should be applied regardless of the number of translation units. I'd say "all declarations" is synonym for every declaration in the entire program, which is composed of a set of translation units. I tried to seek further clarification, but I didn't find it. Common sense tells me that it should be interpreted as being applied in every translation unit. – Filipe Gonçalves Jul 17 '15 at 15:28
  • @SergeBallesta But yeah, I agree that while it may be UB formally, it's mostly ok for nearly every compiler and linker. – Filipe Gonçalves Jul 17 '15 at 15:29
  • 3
    Worth noting that if the declaration had simply omitted the size, the two array types would be compatible and there would be no problem (as long as the code which used the array had some way of avoiding an invalid index). Of course, in that case you could not use the `sizeof` operator because it can't be applied to an incomplete type. – rici Jul 17 '15 at 15:52
  • 1
    @FilipeGonçalves, the important part is not "all declarations" (because that would include declarations in different scopes or with no linkage) but "all declarations **that refer to the same object**", which is defined by 6.2.2 _"In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function."_ Therefore the declarations refer to the same object, and 6.2.7.2 applies. – Jonathan Wakely Jul 18 '15 at 01:46
  • @JonathanWakely You nailed it. I'm adding this to my answer. Thanks! – Filipe Gonçalves Jul 18 '15 at 12:18
-1

Yes, it is legal. The language allows it.

In your specific case there will be no undefined behavior as the extern declared array is smaller than the actually allocated array.

It can be used in a case where the declaring module uses the "unpublished" array elements for e.g. housekeeping of its algorithms (abstraction hiding).

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
  • I'm sorry for you because it was what I initialy thought, but your are wrong. It is definitely undefined behaviour. See other answers. – Serge Ballesta Jul 18 '15 at 08:57
  • @serge-ballesta, there may be a difference between the standard not defining a behavior for the case, and whether the program will behave in an undefined (unpredictable) way. The latter will not be the case. The compiler will just compile the two modules in a normal way as it doesn't know about the different declarations, and the linker will fix-up the extenal reference to defined memory which will be there. All accesses will be correct, hence no unpredictable behavior. – Paul Ogilvie Jul 18 '15 at 09:06
  • 1
    You should look at [my answer to referenced question](http://stackoverflow.com/a/31471930/3545273). I know that most **current compilers** process it correctly and explain why there. But I also cite some research for future compilers that could break it. – Serge Ballesta Jul 18 '15 at 09:19
  • @serge-ballesta, thanks for the insight.I would expect that even the memory-safe implementation will gracefully accept the case as no memory outside the guarded region will be addressed. The research is interesting, though I think that if implemented, it will raise new challenges for the standard to define behavior not yet [needed to be] defined, i.e. currently undefined. – Paul Ogilvie Jul 18 '15 at 10:50