0

I've been programming for a few years but embarrassingly, there are one or two things i'm still not fully clear about.

In the following basic code below just used for an example, when the compiler encounters myFunc(), where will str1 and str2 get stored?

They are pointers to string literals so I assume the string literal will get stored in read only memory, but what is the difference in this case between one pointer being static local and the other one not? Also, I thought local variables will get stored on the stack and they are not allocated until the function is called? This is confusing.

In the case of the integers, var1, it's non-static, but var2 is static. Will the compiler place this var2 in the data segment at compilation time. I've read on another post When do function-level static variables get allocated/initialized? , that local static variables will get created and initialsed the first time they are used and not during compilation. So in that case, what if the function is never called?

Thanks in advance for experienced knowledge.

EDITED: To call myFunc() from main(). It was a typo as myFunc() was never even called

int myFunc()
{
    static char* str1 = "Hello";
    char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;

}

int main()
{

    return myFunc();
}
Community
  • 1
  • 1
Engineer999
  • 3,683
  • 6
  • 33
  • 71
  • 16
    As written, the compiler will optimize away the whole body of the function. – Jonathan Leffler Sep 15 '16 at 16:21
  • Retag the question with the specific compiler you are interested in and you'll be a bit less likely to have to deal with these olafesque comments. – Hans Passant Sep 15 '16 at 16:31
  • 1
    the function: `myFunc()` is never called/invoked, so the compiler will (if optimization is enabled (or dead code removal is enabled) completely remove that function. In general, other than where the literals are located and where the pointers and `int` variables are located, it makes no difference to the person programming the application – user3629249 Sep 15 '16 at 16:32
  • Any optimizing compiler worth its salt will just completely optimize away all your unused code. – Jesper Juhl Sep 15 '16 at 16:58
  • Rolled back: the answers make more sense with it posed in the original way. – Bathsheba Sep 15 '16 at 18:07
  • Post the platform used, else the post is unclear and/or too broad. – chux - Reinstate Monica Sep 15 '16 at 18:13
  • @Engineer999 Your edit might get rolled back; some people feel it is too late to clarify the context of the question after several answers have been written addressed to the "typo." However, if you are going to try to satisfy the pedant, I suggest adding a return value to `myFunc` (to avoid undefined behavior) and adding a side effect to `myFunc` (like printing the variables to the screen) since a compiler with optimize flags turned on may still optimize away all 4 lines of `myFunc` as you have it now. – caps Sep 15 '16 at 19:24
  • @Engineer999 - pls add `return (int)str1+(int)str2+var1+var2;` to `myFunc` so that people stop complaining that everything is optimized away. – rustyx Sep 15 '16 at 19:29
  • Sadly now the behaviour of your program is undefined. the compiler can do anything, including eating your cat. – Bathsheba Sep 15 '16 at 21:49

6 Answers6

11

EDIT:

The other answer and comments are correct - as is, your variables will be optimized out because they aren't even used. But let's have a little fun and actually use them to see what happens.

I compiled the op's program as-is with gcc -S trial.c, and although myFunc was never called, nothing else about this answer changes.

I've slightly modified your program to actually use those variables so we can learn a little more about what the compiler and linker will do. Here it is:

#include <stdio.h>

int myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);
    return 0;
}

int main()
{
    return myFunc();
}

I compiled with gcc -S trial.c and got the following assembly file:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "%s %s %d %d\12\0"
    .text
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $64, %rsp
    .seh_stackalloc 64
    .seh_endprologue
    leaq    .LC0(%rip), %rax
    movq    %rax, -8(%rbp)
    movl    $1, -12(%rbp)
    movl    var2.3086(%rip), %edx
    movq    str1.3083(%rip), %rax
    movl    -12(%rbp), %r8d
    movq    -8(%rbp), %rcx
    movl    %edx, 32(%rsp)
    movl    %r8d, %r9d
    movq    %rcx, %r8
    movq    %rax, %rdx
    leaq    .LC1(%rip), %rcx
    call    printf
    movl    $0, %eax
    addq    $64, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $32, %rsp
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
    call    myFunc
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .data
    .align 4
var2.3086:
    .long   8
    .section .rdata,"dr"
.LC2:
    .ascii "Hello\0"
    .data
    .align 8
str1.3083:
    .quad   .LC2
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 isn't even found in the assembly file. It's actually just a constant that gets loaded onto the stack.

At the top of the assembly file, we see "World" (str2) in the .rdata section. Lower down in the assembly file, the string "Hello" is in the .rdata section, but the label for str1 (which contains the label, or address, for "Hello") is in the .data section. var2 is also in the .data section.

Here's a stackoverflow question that delves a little deeper into why this happens.

Another stackoverflow question points out that the .rdata section is the read-only section of .data and explains the different sections.

Hope this helps.


EDIT:

I decided to try this with the -O3 compiler flag (high optimizations). Here's the assembly file that I got:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
    .section    .text.unlikely,"x"
.LCOLDB3:
    .text
.LHOTB3:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $8, 32(%rsp)
    movl    $1, %r9d
    call    printf
    nop
    addq    $56, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE3:
    .text
.LHOTE3:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB4:
    .section    .text.startup,"x"
.LHOTB4:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .section    .text.startup,"x"
.LHOTE4:
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 is now just a constant 1 that is placed in a register (r9d). var2 is also just a constant, but it's placed on the stack. Also, the strings "Hello" and "World" are accessed in a more direct (efficient) way.

So, I decided that I wanted to try something slightly different:

#include <stdio.h>

void myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);

    var1++;
    var2++;
    printf("%d %d", var1, var2);
}

int main()
{
    myFunc();
    myFunc();
    return 0;
}

And the associated assembly using gcc -O3 -S trial.c

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
.LC3:
    .ascii "%d %d\0"
    .section    .text.unlikely,"x"
.LCOLDB4:
    .text
.LHOTB4:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    movl    var2.3086(%rip), %eax
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $1, %r9d
    movl    %eax, 32(%rsp)
    call    printf
    movl    var2.3086(%rip), %eax
    leaq    .LC3(%rip), %rcx
    movl    $2, %edx
    leal    1(%rax), %r8d
    movl    %r8d, var2.3086(%rip)
    addq    $56, %rsp
    jmp printf
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .text
.LHOTE4:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB5:
    .section    .text.startup,"x"
.LHOTB5:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    call    myFunc
    call    myFunc
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE5:
    .section    .text.startup,"x"
.LHOTE5:
    .data
    .align 4
var2.3086:
    .long   8
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

This is looking a little more like the original. var1 is still optimized to just constants, but var2 is now in the .data section again. "Hello" and "World" are still in the .rdata section because they are constant.

One of the comments points out that this would be different on different platforms with different compilers. I encourage you to try it out.

Community
  • 1
  • 1
mgarey
  • 733
  • 1
  • 5
  • 19
  • 1
    Note that this assembly will vary by compiler, compiler flags, platform, etc. Still, great answer. – caps Sep 15 '16 at 17:03
  • @caps Good point. It would probably be useful (and fun) to compile this with different compiler flags and compilers to see what happens. Just so everyone knows, I was using ```gcc --version``` gives me ```gcc.exe (Rev1, Built by MSYS2 project) 5.4.0``` - I'm using mingw on Windows 10. – mgarey Sep 15 '16 at 17:06
  • Variables are not optimized out by default. There are optimized if you reclaim it. – Jean-Baptiste Yunès Sep 15 '16 at 17:07
  • @Jean-BaptisteYunès You're correct. I'll update my answer. – mgarey Sep 15 '16 at 17:10
3
static const char* str1 = "Hello";

str1 is a static local pointer to a string literal which will be stored in read-only memory.

const char* str2 = "World";

str2 is a local, "stack-allocated" pointer to a string literal which will be stored in read-only memory.

The values of str1 and str2 are the respective addresses of the string literals they point to.

int var1 = 1;
static int var2 = 8;

If these lines of code are never reached, var2 will never be initialized. I don't know if the compiler sets aside a block of memory for it somewhere else at compiletime or not.

caps
  • 1,225
  • 14
  • 24
2

What the compiler does must be based (assuming a correctly working compiler) on the semantics of the code, so that's what I'll discuss.

First, a fairly minor point. By declaring a function with (), you specify that it takes an fixed but unspecified number and type(s) of arguments. That's an obsolescent form of declaration/definition, and there's rarely if ever a good reason to use it. (Empty parentheses have a different meaning in C++, but you're asking about C.) To specify that a function has no parameters, use (void) rather than () (especially for main, since it's not 100% clear that int main() must be accepted by a conforming compiler).

With that change:

int myFunc(void)
{
    static char* str1 = "Hello";
    char* str2 = "World";
    int var1 = 1;
    static int var2 = 8;
}

int main(void)
{
    return myFunc();
}

This program does nothing; it produces no output, and has no side effects. A compiler is permitted to compile it down to nearly nothing. But let's ignore that and assume that nothing is discarded.

There are two important concepts to consider: scope and lifetime (also known as storage duration). The scope of an identifier is the region of program text in which it is visible. It's purely a compile-time concept. The lifetime of an object is the duration during execution in which that object exists. It's purely a run-time concept. The two are often confused, particularly when you use the words "local" and "global".

An object with automatic storage duration is created on entry to the block in which it's defined, and (logically) destroyed on exit from that block. In your program, the relevant block is enclosed by the { and } in the definition of myFunc().

An object with static storage duration exists during the entire run time of the program.

static char* str1 = "Hello";

"Hello" is a string literal. It specifies a static array of type char[6]; that array (at least logically) exists during the entire execution of the program. You are not allowed to modify the contents of that array -- but for historical reasons, it's not const, and a compiler isn't required to warn you if you try to modify it. String literals are commonly stored in read-only memory (probably not physical ROM, but virtual memory that's marked as read-only).

The pointer object str1 also has static storage duration, though its name is visible only within the enclosing block ("block scope"). It's initialized to point to the initial character of "Hello". This initialization logically occurs before entry to main. Since a string literal is effectively read-only, it would have been better to use const to avoid the risk of accidentally trying to modify it:

static const char *str1 = "hello";

Next:

char* str2 = "World";

The name of the pointer object str2 has the same kind of block scope as str1, but the pointer object itself has automatic storage duration. it is created on entry to the enclosing block and destroyed on exit. It's initialized to point to the initial character of "World"; that initialization takes place when execution reaches the declaration. Again, it would be better to add a const to the declaration.

int var1 = 1;
static int var2 = 8;

var has block scope and automatic storage duration. It's initialized to 1 when its declaration is reached at run time. var2 has block scope and static storage duration. The object exists for the entire execution of the program, and it's initialized to 8 before entry to main().

Now we run into a bit of a problem. You've defined myFunc() to return an int result, but you don't actually return anything. As it happens, this isn't invalid by itself, but if the result is used by a caller (as it is by your main() function), the behavior is undefined. The fix is simple: add a return 0; before the closing }.

Assuming you've added that, main calls myFunc. During execution of myFunc, str2 and var1 are allocated somehow and are initialized as I've described. (Nothing happens to str1 or var2 because they're static.) On return from the function, the storage allocated for str2 and var1 is released, effectively destroying the objects.


But the question you asked was: What will the compiler do? And the answer to that is: It will generate whatever code is necessary to implement the semantics I've just described. That's really all the C standard requires.

In practice, most compilers generate code that allocates variables with automatic storage duration on the "stack". The "stack" is usually a contiguous region of memory, starting from some fixed base address, that grows in one direction as items are added to it and shrinks in the other direction as items are removed. It's typically managed via a CPU register, the "stack pointer". (Some CPUs also have a "frame pointer".) But in fact all that the C standard requires is that such objects are allocated and deallocated in a first-in last-out manner -- and the actual allocation and deallocation needn't take place when you'd expect, as long as the resulting behavior is the same. For example, if you define a local object inside a loop, it might be allocated and deallocated on each iteration, or its allocation might be folded into the surrounding scope. The C standard doesn't care (and, in most cases, neither should you). There are even some compilers that don't use a contiguous stack at all; rather the storage for each function call is allocated from a heap. A contiguous stack is the best solution 90+% of the time, but it's not required.

Objects with static storage duration are typically allocated on program startup, before main is called. Most systems store the initial contents of any initialized static objects in the executable file, so it can be loaded into memory. (That's likely to include string literals.) For static objects whose initial value is zero, the executable might just contain information about how much zeroed memory to allocate.

As for the generated instructions that operate on this data, that is entirely dependent on the CPU being targeted, and probably on the system ABI.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
1

Your code cannot be compiled without at least a warning as the function never returns anything which contradicts the return type specification.

Anyway on my machine it generate code. If you don't use any optimization code is emitted for the function to allocate the local str2. str1 and var2 are allocated in the data section of the code to point to the respective values. If you use optimization obviously a stupid code is emitted and unsued local variable disappeared as unused globals.

To observe this you can at least examine the object code with nm:

$ gcc -o p p.c
$ nm p
0000000100000f90 T _main
0000000100000f70 T _myFunc
0000000100001000 d _myFunc.str1
0000000100001008 d _myFunc.var2
$ gcc -O3 -o p2 p.c
$ nm p2
0000000100000fb0 T _main
0000000100000fa0 T _myFunc

If you want more details, then generate assembler code with -S and observe what happens.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
0

The compiler will produce a program that takes no input, does nothing, then emits no output.

All of those declarations are completely irrelevant as they do not contribute anything to the [non-existent] result of the program. You might say they "get optimised out", though the reality is that they literally have no analogue in your resulting compiled executable.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 4
    You "answer" is not helpful in the least. – GreatAndPowerfulOz Sep 15 '16 at 16:46
  • Way to ignore the entire body of the question because of the last 4 (irrelevant!) lines of code... – caps Sep 15 '16 at 16:49
  • 2
    Why is this being downvoted? It's the correct answer to "what will the compiler do". Any sensible optimising compiler will leave the unused function out altogether. Note that all this talk about data segments are implementation concepts not c and c++ standard concepts. Check the assembly generated by some common compilers and see for yourself. – Bathsheba Sep 15 '16 at 16:51
  • 1
    @Bathsheba it's downvoted because he's not answering the question! – GreatAndPowerfulOz Sep 15 '16 at 16:55
  • 3
    @Great.And.Powerful.Oz but he *is*. The answer is "the compiler will just delete the code since it is unused". Which is the essence of the answer. – Jesper Juhl Sep 15 '16 at 17:00
  • 2
    @JesperJuhl no he's not. It's obvious what the OP is asking. This guy is just being an unhelpful ass and so are you – GreatAndPowerfulOz Sep 15 '16 at 17:02
  • 1
    @Great.And.Powerful.Oz OP asked "what will the compiler do" - the answer *is* "delete the code". Perhaps OP should ask a better question with a better example. – Jesper Juhl Sep 15 '16 at 17:04
  • 1
    I have proposed a simple edit that clarifies the question. – caps Sep 15 '16 at 17:08
  • No that answer is wrong! Of course the compiler emits code, of course that code does mostly nothing externally observable (at least with great difficulties). But something is compiled and things happens internally. Try it. – Jean-Baptiste Yunès Sep 15 '16 at 17:13
  • @Bathsheba No a reasonable compiler emits code for what you requested, even if it may seems stupid. It makes optimization only if you request it. – Jean-Baptiste Yunès Sep 15 '16 at 17:14
  • Actually, @caps, your edit would queue the question for locking and deletion because it removed the "Complete" from [mcve]. A better edit would be to put in a side effect that prevents the compiler from discarding the code like mgarey did. I'm putting this question down as an X-Y. OP wanted one question and accidentally asked another. Happens sometimes. – user4581301 Sep 15 '16 at 17:35
  • Fair enough: round two. Second edit enqueued per your suggestion. – caps Sep 15 '16 at 17:37
  • oh. . Why didn't I just put in the function call to myFunc() – Engineer999 Sep 15 '16 at 19:07
  • 2
    I've edited my post now to include the fucntion call. Surely most people realise it was a typo from my side going by how I asked the question – Engineer999 Sep 15 '16 at 19:19
  • It would be better if criticism on Stack Overflow amounted to more than "you're an unhelpful ass", @Great.And.Powerful.Oz. Re-read [the site rules](http://stackoverflow.com/help/be-nice), please, before commenting again. – Lightness Races in Orbit Sep 15 '16 at 23:02
  • The point is that the questioner, much like many other C++ newcomers, doesn't comprehend yet the difference between the lines you write in your source code, and what the computer executes. I tried my best to teach the OP that thing, especially as it formed the basis of the question. It's a shame that voters and commenters are also too inexperienced (and, paradoxically, sure of themselves) to understand that. Oh well. – Lightness Races in Orbit Sep 15 '16 at 23:03
  • @caps: This has nothing to do with "the last four lines of code" (which I didn't even read). This is the answer to the question as posed. This is what the compiler will do. If you think the compiler will do something else instead, provide evidence to the contrary. But a compiler won't "store" something somewhere out of sh!ts and giggles. – Lightness Races in Orbit Sep 15 '16 at 23:39
  • @LightnessRacesinOrbit, thanks for reminding me of the rules. I'll also remind you that you violated rules 1 and 2. Your answer was rude and condescending, it was also unwelcoming and impatient. – GreatAndPowerfulOz Sep 16 '16 at 16:29
  • @Great I could not disagree more. – Lightness Races in Orbit Sep 16 '16 at 16:45
  • @LightnessRacesinOrbit, lol, of course, you disagree. You would have to humble yourself to find any fault in what you did. – GreatAndPowerfulOz Sep 16 '16 at 17:24
  • @Great Please say one more time for the record that you think _I'm_ the rude one here. Just so that I can be certain not to misinterpret your claims before we take this further. See, you're the one whose greatest contribution to this post is to call people "an unhelpful ass", then spit out four insulting words in a row, then launch into a complete, unwarranted and disgraceful personal attack. So I'm just making sure you still believe you're the non-rude one. I await your response! – Lightness Races in Orbit Sep 16 '16 at 17:57
  • @LightnessRacesinOrbit, apparently a lot of other people agree with me. – GreatAndPowerfulOz Sep 16 '16 at 21:53
  • @Great: If you click on the voting score, you'll see that just slightly less than half agree with you; slightly more than half agree with me. But I see no evidence that _anyone_ condones your behaviour. – Lightness Races in Orbit Sep 17 '16 at 18:33
  • @LightnessRacesinOrbit, believe what you like. TTYL. – GreatAndPowerfulOz Sep 17 '16 at 23:15
0

static variables, even those within a function scope, will get stored at global scope. The static variables within a function or scope will get initialized only the first time that function or scope is entered. Non-static variables will get allocated or stored on the stack in most compilers when function scope is entered and initialized when scope is entered. Some compilers store local variables elsewhere.

GreatAndPowerfulOz
  • 1,767
  • 13
  • 19
  • 1
    Stack is not a c or c++ standard concept. You can't guarantee the compiler will do this. – Bathsheba Sep 15 '16 at 16:55
  • @Bathsheba, actually you can because it's specified in the "C" and "C++" standard what happens with static variables and scoping rules. – GreatAndPowerfulOz Sep 15 '16 at 16:56
  • Where in the c standard is stack mentioned? Granted there are a couple of std c++ functions that mention it, by nothing else. – Bathsheba Sep 15 '16 at 16:58
  • @Bathsheba, name me one commercial or widely used open-source "C" or "C++" compiler that doesn't use a stack for local variables. – GreatAndPowerfulOz Sep 15 '16 at 17:00
  • A widely used c compiler on embedded systems doesn't have a stack. Granted, it doesn't support function recursion so isn't therefore strictly a c compiler. But don't you see my point: using implementation concepts only weakens an answer. Why not use static and automatic storage duration instead, like the standards do? The answer to this question remains the same: the compiler needs to produce code that returns zero from main. How it achieves that is up to it. – Bathsheba Sep 15 '16 at 17:03
  • @Bathsheba, I see your point but that doesn't really change the answer, especially since the "C" compiler you reference is non-conforming. – GreatAndPowerfulOz Sep 15 '16 at 17:06