how operating system and compiler communicate to start the a program compilation

Question

when i am giving the command to compile a .cpp code like

g++ abc.cpp

am i calling a systemcall to first start the gcc compiler and feed it the file(abc.cpp) to compile.

otherwise how void _start() function of compiler is getting called which used to call main() funtion.

Is it like system() of os is getting called and it will call the void _start()

Possible duplicate of [How does the compilation/linking process work?](https://stackoverflow.com/questions/6264249/how-does-the-compilation-linking-process-work) — Zeeshan Akhter, Dec 02 '18 at 13:44
`g++` is typically a driver executable that executes other executables (e.g. the preprocessor, compiler, linker, etc). With some early versions of the gnu compiler collection, it was a shell script; with more recent versions it is a compiled executable. The precise way it executes those other programs depends on the host system, but would not normally be a call of `system()`. — Peter, Dec 02 '18 at 13:48
The sole things the OS does in that process is loading `g++`, executing it and serve whatever system calls (either direct or through `g++`s use of some library that in turn issues a system call) it might make. (most likely dynamic memory allocation and opening/reading one or more input files). — Swordfish, Dec 02 '18 at 13:50

Hatted Rooster · Answer 1 · 2018-12-02T13:47:38.893

1

g++ is just an alias for somewhere/bin/g++ which is an executable. That executable is passed abc.cpp as arguments and then goes off and does what a compiler does to compile the file. It might use other executables to compile and link the file under the hood.

The only thing the OS does is load g++ into memory and call its main through the crt.

edited Dec 02 '18 at 13:47

answered Dec 02 '18 at 13:45

Hatted Rooster

35,759
6
62
122

1

Winner winner chicken dinner! – Swordfish Dec 02 '18 at 13:46
@Swordfish A warm welcome, fish. – Hatted Rooster Dec 02 '18 at 13:46
*warm* – yes, grilled :D – Swordfish Dec 02 '18 at 13:46

score 0 · Answer 2 · answered Dec 02 '18 at 14:20

When you are typing commands, you are using a program called a command-line shell. This is a program that reads from input, analyzes the text it receives, and executes the commands in the text.

For g++ abc.cpp, the shell looks up g++ and finds it is the name of an executable file (either directly or because it is a link to an actual file). It then executes that file. This is a fairly complicated process that includes creating a child process that loads the executable file into memory and then executes it. (Note: Some executable files are shell scripts rather than binary executables containing machine instructions. Shell scripts are executed by loading the shell program and telling it to execute the script.)

The g++ program then analyzes the arguments it was given. In the case of g++ abc.cpp, it will open “abc.cpp” and compile it. g++ is a program like any program you can write: It reads files, performs computations, and writes files.

g++ may be a single executable that does many things, but it likely performs much of its job by creating additional subprocesses to execute other programs. There may be a separate program to do the actual compiling of the code and another to link the code into an executable. (There can also be separate programs for preprocessing and optimization, as well as for compiling the code into an intermediate language, then for generating assembly language, then for assembling the assembly language, but these may also be integrated into one program.)

There are many interactions with the system in this process. Opening files, and reading from and writing to them, requires system calls. Your question seems largely to ask about executing programs.

Roughly, on Unix and similar systems, the steps involved in one program causing another to be executed are:

The program calls fork. This is a system call that creates a duplicate of the process. When it is done, there are two copies of the same process. One is called the parent and one is called the child. The system tells each process whether it is the parent or the child by a return value from the fork call.
The program examines the fork return value to see whether it is the parent or the child. If it is the child, it calls a routine in the exec family to execute another program.
The exec call opens the file containing the program to be executed and reads it into memory. This is involves interpreting the contents of the executable file, because the executable file is not just raw data. It contains a variety of structures that describe different things to be put into memory when preparing to run the program.
Much of the work of the exec call can be done in ordinary ways: Opening a file, reading its contents, analyzing its contents, and arranging things in memory. Additionally, executable files may use shared libraries, and loading those will require opening more files and loading them. However, the exec call will be assisted in some degree with system calls that change memory mappings for the process and perform other tasks.
Ultimately, when the program to be executed is sufficiently loaded into memory, the software that is loading it will transfer control to its start address, and then the process is running the new program.

(I have probably given short shrift to the exec and loading processes, and possibly other issues touched on above.)

score 0 · Answer 3 · answered Dec 02 '18 at 16:02

This works differently on various operating systems but you have specifically asked about Eunuchs.

In Unix variants users interact with a shell. (Unlike on many systems) the shell is just a program with no special qualities. You can write your own simple shell, run it, and let that be your interface to the system.

When you type something at a standard shell, such as

$ XYZ

the shell has to determine what XYZ is.

It may be an alias that has to be expanded into some other command.
It may be a command that the shell performs internally. Shell settings and commands like "cd" are typically performed by the shell.
If XYZ is not one of these, the shell typically tries to find a file named XYZ. This is typically done by searching the directories specified in the PATH variable.
Assuming that the shell can find XYZ, the shell will usually determine if XYZ is a script file containing a list of commands to process. Processing scripts is usually the most complex part of a shell program. For each command. Script will typically contain a mixture of control instructions (e.g. if) and commands. For each command that the script identifies it has to go back to step 1 and process it.
If XYZ is an executable file the shell has to run it. In Unix, the shell creates a new process using the fork and exec-type system services.

Your XYZ is g++. In your example you have a parameter abc.cpp. Unix shells historically have had no real command line parsing. They just break the command string up into the file name and parameters. The program is responsible for making sense of the parameters.

The exec family of system services used to run a program allow specifying parameters to pass to the program:

If you did

   g++ -xi132m asdfadf 345211 afara

the shell would typically invoke the g++ executable program and pass it four nonsensical strings. g++ would find these as the arvc and argv parameters to main. g++ would likely report an error and exit. In your case, g++ would find one parameter, the string "abc.cpp."

how operating system and compiler communicate to start the a program compilation

3 Answers3