1

I am trying to understand the loader of the c++/g++ compilers and the convention it uses .

I have four source files .

Hello.h
Hello.cpp
Hello1.cpp
main.cpp

Hello.h

#include <iostream>

class Hello1
{
public:
int a;
void sayHello();
};

Hello.cpp

 #include"Hello.h"

        void Hello1::sayHello()
        {
        std::cout<<this->a;
        }

Hello1.cpp

#include"Hello.h"

void Hello1::sayHello()
{
std::cout<<"Hello";
}

main.cpp

#include"Hello.h"

    int main()
    {
    Hello1 hello;
    hello.a=5;
    hello.sayHello();
    return 0;
    }

Preprocessing and assembling passes for each file individually and also

c++ -c main.cpp
also produces a main.o . But when linking and loading to producing an executable ie c++ main.o it gives an error saying the function definition cannot be found
main.o: In function main':
main.cpp:(.text+0x19): undefined reference toHello1::sayHello()'
collect2: ld returned 1 exit status
I know that if i name the class Hello and include a corresponding Hello.cpp the loader will find the function definition and execute the member function . But if i change the name of the class inside the header file Hello.h from Hello to Hello1 the object file is created without a problem and the compiler knows that a class Hello1 exists and allocates memory for it ( guessing the success of the c++ -c command ) but the loader can't find the function body of sayHello() . This seems likes it's not looking into Hello.cpp or Hello1.cpp because the Hello.h has a different class apart from class Hello

So how does the loader load the function definition even in a normal case ? does it reference the filename Hello.h and look for a Hello.cpp , or does it reference a class name Hello1 and look for a Hello1.cpp , Or does it have a constraint check to see if the .h and class names are the same and then only look for a .cpp of the same name and ignore the rest of the classes in the header file ?

It would be great if some c++ guru could give me some insights to what basis the loader picks up the definitions included in #include in a normal c++ file , Also in this case how to reference the definition of sayHello() by using different names itself , is it possible at all ? or can a header file only entertain interface to classes having the same name

  • 1
    Could you post the command line that is currently used for the *linking* step? – Medinoc Apr 07 '16 at 12:25
  • 1
    "if i change the name of the class inside the header file Hello.h from Hello to Hello1". Your Hello.h file already declares a Hello1 class, which is referenced from your main(). Your question contradicts itself, and makes no sense. – Sam Varshavchik Apr 07 '16 at 12:34
  • Wow. I've clearly referenced the standard implementation right before your one line cut and paste , from which I'm deviating . The line clearly is a reference to the standard implementation and not to the current modification which has brought this doubt up ... – Deepak Nair Apr 11 '16 at 09:46
  • @Medinoc just a simple 'c++ main.o' , since the input is an object file gcc/c++ recognizes that and calls ld [...] main.o – Deepak Nair Apr 11 '16 at 10:41
  • Shouldn't it be `c++ main.o hello.o`? – Medinoc Apr 12 '16 at 11:51

2 Answers2

3

Short version: You provide a set of files that provide a list of symbols. You (or the build system) are responsible for providing the "right" list of symbols (and their defintion) by specifiying the correct files. It doesn't matter whether those files are called Hello, Hello1, foo or bar (+ the appropriate suffix)


Let's take a look at the result of c++ -c main.cpp via objdump -t -C main.o

SYMBOL TABLE:
00000000 l df *ABS* 00000000 main.cpp
00000000 l d .text 00000000 .text
00000000 l d .data 00000000 .data
00000000 l d .bss 00000000 .bss
00000000 l O .bss 00000001 std::__ioinit
00000050 l F .text 00000042 __static_initialization_and_destruction_0(int, int)
00000092 l F .text 0000001a _GLOBAL__sub_I_main
00000000 l d .init_array 00000000 .init_array
00000000 l d .note.GNU-stack 00000000 .note.GNU-stack
00000000 l d .eh_frame 00000000 .eh_frame
00000000 l d .comment 00000000 .comment
00000000 g F .text 00000050 main
00000000 *UND* 00000000 Hello1::sayHello()
00000000 *UND* 00000000 __stack_chk_fail
00000000 *UND* 00000000 std::ios_base::Init::Init()
00000000 *UND* 00000000 .hidden __dso_handle
00000000 *UND* 00000000 std::ios_base::Init::~Init()
00000000 *UND* 00000000 __cxa_atexit

There's a symbol main, it's a function and it "needs" some other symbols that have not been found in this compilation unit.
To illustrate this let's modify main.cpp a little

#include"Hello.h"
#include <iostream>

// noinline, so that the compiler "keeps" this a function + function calls
void __attribute__ ((noinline)) foo() 
{
  std::cout << "ho ho ho" << std::endl;
}

int main()
{
  Hello1 hello;
  hello.a=5;
  foo();
  hello.sayHello();
  return 0;
}

Now the output of objdump... is

SYMBOL TABLE:
00000000 l    df *ABS*  00000000 main.cpp
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l     O .bss   00000001 std::__ioinit
00000000 l    d  .rodata    00000000 .rodata
00000084 l     F .text  00000042 __static_initialization_and_destruction_0(int, int)
000000c6 l     F .text  0000001a _GLOBAL__sub_I__Z3foov
00000000 l    d  .init_array    00000000 .init_array
00000000 l    d  .note.GNU-stack    00000000 .note.GNU-stack
00000000 l    d  .eh_frame  00000000 .eh_frame
00000000 l    d  .comment   00000000 .comment
00000000 g     F .text  0000002f foo()
00000000         *UND*  00000000 std::cout
00000000         *UND*  00000000 std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
00000000         *UND*  00000000 std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
00000000         *UND*  00000000 std::ostream::operator<<(std::ostream& (*)(std::ostream&))
0000002f g     F .text  00000055 main
00000000         *UND*  00000000 Hello1::sayHello()
00000000         *UND*  00000000 __stack_chk_fail
00000000         *UND*  00000000 std::ios_base::Init::Init()
00000000         *UND*  00000000 .hidden __dso_handle
00000000         *UND*  00000000 std::ios_base::Init::~Init()
00000000         *UND*  00000000 __cxa_atexit

As you can see there's no *UND* foo(), the compiler could resolve that symbol+call on its own.
Ok, now what does the linker do? It get's a list of input files and makes a list of all the symbols defined in those files. Then it looks for the dependencies and tries to resolve them. main "needs" a symbol Hello1::sayHello() (the -C option made it look like this, see https://en.wikipedia.org/wiki/Name_mangling).
If there is such a symbol in the linker's symbol list (and it fits) then the dependency can be resolved. If there is no such symbol you get the "undefined reference to" / "unresolved symbol" error message.
I.e. you have to provide an object (file) that defines the needed symbol or else the linker will fail. What name this file has doesn't matter.

Hello.o provides a symbol Hello1::sayHello() and it would satisfy the requirements of the reference in main.oc

...
00000000 g     F .text  0000001f Hello1::sayHello()
00000000         *UND*  00000000 std::cout
00000000         *UND*  00000000 std::ostream::operator<<(int)
00000000         *UND*  00000000 std::ios_base::Init::Init()
00000000         *UND*  00000000 .hidden __dso_handle
00000000         *UND*  00000000 std::ios_base::Init::~Init()
00000000         *UND*  00000000 __cxa_atexit
..

and so does Hello1.o

...
00000000 g     F .text  0000001e Hello1::sayHello()
00000000         *UND*  00000000 std::cout
00000000         *UND*  00000000 std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
00000000         *UND*  00000000 std::ios_base::Init::Init()
00000000         *UND*  00000000 .hidden __dso_handle
00000000         *UND*  00000000 std::ios_base::Init::~Init()
00000000         *UND*  00000000 __cxa_atexit
...

So if you call (or let c++/gcc make that call) ld [...] main.o Hello.o the definition of the symbol Hello1::sayHallo() is taken from Hello.o, if you call ld [...] main.o Hello1.o Hello1.o's Hello1::sayHallo() is used.
Now call c++ main.cpp Hello.cpp Hello1.cpp and you'll get a "Hello.cpp:(.text+0x0): re-definition of `Hello1::sayHello()' error because there are two symbols with the same name (and no mechanism how to resolve that problem....).

Community
  • 1
  • 1
VolkerK
  • 95,432
  • 20
  • 163
  • 226
  • Cool mate , that's the case of explicitly defining it which is explained very well . With this as a base , I would like you to explain a widely used standard scenario using user2877673 answer to http://stackoverflow.com/questions/9075931/function-declaration-inside-or-outside-the-class . In this case , the main function calling member function Clazz:x , from Clazz including only a Clazz.h where only the prototype is specified . If a file named Clazz.cpp has the definition (body) and we compile it , The loader without any ref to .cpp and automatically resolves the symbol 'Clazz.x',How? – Deepak Nair Apr 11 '16 at 10:35
  • Please explain first what you mean exactly by "The loader". You've used that term repeatedly and I don't know what you're refering to in this context. – VolkerK Apr 11 '16 at 23:52
  • 'ld' itself , i used to term loader to just generally mean the logical part of the compiler which loads the symbols after compilation . If i'm not wrong c++/gcc spilts it into preprocesing , compiling and linking (.cpp , .s , .o , .out) , so whoever does the linking/loading , Whose behavior i'm not currently able to understand . Any other case it's very clear as you've mentioned above . But this default behavior of automatically including the .cpp with the same name of the .h 's object file if it exists , when called upon else where is pretty confusing . – Deepak Nair Apr 12 '16 at 11:13
  • "automatically including the .cpp" I don't think this happens. The compiler knows from the declaration in the .h file alone what it needs to know. Just take the two files from http://pastebin.com/6RkxKu6k run `c++ --debug -c main.cpp` and then take a look at the result via `objdump -t -C -S main.o` – VolkerK Apr 12 '16 at 15:01
2

You need to tell the linker which file object (.o) file to use. Hello.o or Hello1.o. So your command-line would be like this:

c++ main.o Hello.o

or

c++ main.o Hello1.o

If you try to use both, you will get an error like this:

$ c++ main.o Hello1.o Hello.o
Hello.o: In function `Hello1::sayHello()':
Hello.cpp:(.text+0x0): multiple definition of `Hello1::sayHello()'
Hello1.o:Hello1.cpp:(.text+0x0): first defined here
collect2: ld returned 1 exit status

In answer to your last question, no, the name of the header-file (.h and .cpp-file) does not need to match the name of the class defined inside.

So this is legal:

foo.h

class Bar 
{
 public:
 void someFunc();
}
Max Value
  • 183
  • 8
  • Thanks for your insight . What i'd like to know is , if the Hello.o is built from hello.h or hello.cpp ? Also , i have included only "hello.h" and have not included hello.cpp in the source . So while building how the c++/gcc know where to look for the function defintion ? – Deepak Nair Apr 12 '16 at 11:18