3

I would like to how does C++ actually turn the command line arguments into a char array? What "secret" code does this? Where can I view the code that does this (even if it is in assembly, I know some assembly)? I am using Linux, if that helps.

Thank You

vis.15
  • 751
  • 8
  • 18

5 Answers5

7

In most (all?) Unix-based operating systems, they are already an array. It's how operating system executes a process there — when a process starts, there is already an array of arguments ready for it.

The code that turns command line into an array lives in a shell (like bash) or any other program that starts another program. bash has it's sources available, other programs — it's different.

In Windows, they are one string (which you can get unmodified using GetCommandLine() API call), which is parsed by C runtime library to turn it into an array, because the language specification requires them to come as an array.

For programs compiled with Visual C++, the code that does this is included into Visual Studio distributions. You may have to turn on a checkbox that says something like “Include C runtime library source” in installer in order to have it installed.

hamstergene
  • 24,039
  • 5
  • 57
  • 72
2

This is usually handled by the operating system when it creates the process for the program. This code could very well be written in C (for example, if the OS is written in C), or it could be in assembly. To find the code for this, you will probably have to look at the operating system code.

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
2

It's an OS job to manage command line arguments and to put it to the stack during process creation.

For POSIX systems the execution path is:

  1. in your program you call the execle/execve/... system call, passing the path to the new process executable and command line arguments.
  2. this data go to the kernel
  3. the kernel updates its internal structures to take the new process identity into account and allocates the address space for the new process (also kernel purges the old address space if it's not needed anymore). the kernel initializes the process memory with zeros, copying the info from old memory to the new address space at the top of the stack.
  4. the kernel puts the new process to the scheduling queue and returns from the exec() system call, transferring execution path to the userspace and eventually to the entry point of the process (this is usually routine from crt0.o object file, which is linked by default to every executable - this routine calls main()).

For Linux, you can see this code here: http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/fs/exec.c#L383 :

383 /*
384  * 'copy_strings()' copies argument/environment strings from the old
385  * processes's memory to the new process's stack.  The call to get_user_pages()
386  * ensures the destination page is created and not swapped out.
387  */

In the do_execve() kernel counterpart of userspace execve() syscall, at line 1345, copy_strings()'s are called, and copy_strings() routine actually does the job you're asking about.

Dmytro Sirenko
  • 5,003
  • 21
  • 26
  • 1
    "copy_strings() routine actually does the job you're asking about" - The process he was talking about was the command string being turned into an array. That does not happen in the `copy_strings` function - the command is already an array before that function ever touches it. – sepp2k Sep 01 '12 at 22:27
  • I understood the questions as "how the char array argv[] in the new process memory is created", because calling execve() from C code already takes an array. So, yes, from your point of view, it's a shell that takes command line apart, but that code is quite primitive. – Dmytro Sirenko Sep 02 '12 at 09:01
1

It's part of the c runtime library. If you would like to know what that is, look here: What is the C runtime library?

I just created a basic C++ console app using Microsoft Visual Studio and set a breakpoint on the first line of the program. When debugged, the program stops on that line and you call look up the call stack to see the function that calls 'main'. The calling function is part of the c runtime, and this does seem to contain some code that manipulates the command line... I haven't given it a close look, but that's probably where you should start.

Community
  • 1
  • 1
Scott Langham
  • 58,735
  • 39
  • 131
  • 204
  • 1
    Are you sure this is part of the runtime? My understanding is that the OS did this, since at the time that the program starts up the memory for these arrays already has to be allocated. – templatetypedef Sep 01 '12 at 20:38
  • 1
    On Windows CreateProcess takes an LPTSTR for the command line, by the time you're C++ main function sees it, it's in the following parameters: int main(int argc, char *argv[]). It was my best estimation that between one and the other, the C runtime does the conversion by splitting the process's command line arguments up. – Scott Langham Sep 01 '12 at 20:43
  • It's not a part of the C runtime library. All that C (and C++) specify is how the `main` function is to be called. How the command line arguments are formed and how `main` is called when the program starts is not specified in either standard. – David Hammen Sep 01 '12 at 21:09
  • @DavidHammen The C and C++ standards do not specify how the runtime libraries are to be implemented. So it's perfectly plausible that some C and C++ runtime libraries parse command lines and convert them into `argc/argv`. And MSVC is one such example. – David Heffernan Sep 01 '12 at 21:12
1

Sometimes it is not the actual main() that is executed first. For example, on Visual Studio it is the function mainCRTStartup() that serves as entry point that calls Windows API to retreive and parse the command line (you can see this if you use debugger).