0

I'm curious as to what C does exactly to parse command line arguments. For example, assume I have a program named myProgram that takes in two arguments like this

./myProgram arg1 arg2

If I were to call

./myProgram arg1$'\0otherstuff' arg2

arg1 and arg2 would still print if we were to print argv[1] and argv[2], ignoring $'\0otherstuff', but where does it go? Is it store in memory behind arg1? Could it potentially overwrite any buffer? How is arg2 read if there's a null character before it?

bli00
  • 2,215
  • 2
  • 19
  • 46
  • Depends on the OS, but this is a good start: https://en.wikipedia.org/wiki/Crt0 –  May 12 '18 at 23:20
  • First, when you type `./myProgram ...`, you're doing it in a shell. That shell will also interpret whatever you type before it gets passed to the child process. More than likely, your child process gets exactly the arguments as they were when the shell passed them. – Andrew Henle May 12 '18 at 23:32
  • Highly relevant if not exactly a duplicate: https://stackoverflow.com/questions/6570531/assign-string-containing-null-character-0-to-a-variable-in-bash/24511770#24511770 – rici May 13 '18 at 00:31
  • C doesn't parse command-line arguments. It looks like you're on Linux, Mac, or some other Unix or Unix-like system; on such systems, the [`exec` family of functions](http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html) is used to start your program, and all of them require you to have already split the command-line arguments. They also all expect `char*` arguments to null-terminated strings; `otherstuff` doesn't make it into your program at all when it's called. I don't know how much this differs on Windows or other non-POSIX platforms. – Daniel H May 13 '18 at 00:36
  • 1
    @DanielH: On windows it's the C runtime that parses a command line into the argv arguments. Normally, wildcards are not handled by the C runtime and it is up to the program to handle any wildcards. However, MSVC does supply an object file, `setargv.obj`, that can be linked into a program to provide globbing of command line arguments. – Michael Burr May 13 '18 at 00:54
  • @MichaelBurr Is that the runtime of the calling program or of the called program that does the parsing? Based on how you describe `setargv.obj`, I'm guessing it's the called program? – Daniel H May 13 '18 at 01:43
  • @DanielH: Typically on Windows, when working from the command line the shell program is `cmd.exe`. When you enter a command like `foobar arg1 arg2`, `cmd.exe` does little parsing of the command (it does some in order to deal with things like redirection and variable substitution). The Win32 `CreateProcess()` API is used to start the new process. It doesn't parse the command line except to locate the program name (if it is not provided explicitly in a separate argument). The C runtime of the `foobar` process parses the command line, breaking it up into `argv[]` strings before calling `main()`. – Michael Burr May 13 '18 at 05:39

3 Answers3

1

Converting ./myProgram arg1 arg2 into a C style int argc, char *argv[] is done by the operating system or by shell (it depends). C does not parse the arguments, you parse the arguments in C. C is a programming language, not entity. The form int argc, char *argc[] is used in the C programming language as the arguments passed to the main function, but other programming languages may use a different form, for C see main_function.
In linux, one may use execve system call to specify arguments passed to a function. Parsing from the form ./myProgram arg1 arg2 to execve arguments is done by the shell (e.g. bash), which constructs argv array and passes arguments to execve call.
Your shell is probably ignoring the part $'\0otherstuff', because under POSIX flename cannot contain the NUL character (assuming your shell is POSIX compatible).

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
0

When calling an executable, your OS kernel will take the additional arguments (as plain text) and pass them into the program memory. Before the main function is called, a small code is executed, which passes the given arguments to the actual main function in C.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
Domso
  • 970
  • 1
  • 10
  • 22
  • Typically, arguments are processed by the command-line shell, not by the operating system kernel. The shell will then create a new process for the program to be run by forking and, in the resulting child, calling a routine in the `execl` family or something similar. – Eric Postpischil May 13 '18 at 00:26
0

Experimenting with bash (version 3.2.57(1)-release (x86_64-apple-darwin17)) suggests that the “otherstuff” in your example is not passed to the program. When a program is called with the command line you show, the memory pointed to by argv[1] contains “arg1”, then a null character, then “arg2”. Thus, the null and “otherstuff” in your command line has not been passed to the program.

(Hypothetically: If the shell were to pass it to the program, I would expect it would pass it in the memory continuing from that pointed to by argv[1], and there would be no danger of it overwriting any buffer. If the shell were designed to tolerate an embedded null character in an argument, I expect (based on how we design things) that it would treat the argument as a complete string and provide the necessary space to hold it.)

The fact that the argument prior to “arg2” contains a null character is irrelevant to the handling of “arg2”. After initial processing of the command line, the shell does not treat the line as one string. It has divided it into words or other units and handles them with its own data structures. So the presence of null characters in prior arguments has no effect on later arguments.

Additionally, it may not be possible for the shell to pass an argument containing an embedded null character. The routines typically used to execute a program, such as execl, accept the arguments as null-terminated strings. So the embedded null terminates the string, and the execl routine never passes anything beyond the null character.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312