0

As I understand it, you can't know for certain the length of a string in C unless it is declared as a char[] in local scope. If it is passed as an argument, it's a char * pointer, and you can't know the length without using strlen(). Many other answers on Stack Overflow describe this, including this answer to "How to get the string size in bytes?".

But sometimes strings aren't null terminated, and if they aren't, you can end up looking into some other memory while trying to find the end of a string. In my own code, I should always pass the length of the string around so that I know for sure how long it is, but what about arguments to main()?

What if bash has a bug and passes a string that is truncated or isn't null terminated? Or what if my program is called by something other than a shell, like another program that isn't as mature as the most common shells? Could my program segfault? Could I expose the memory of whatever happens to be adjacent to argv?

xordspar0
  • 35
  • 1
  • 8
  • Step 1: Create a situation where you successfully pass in a NUL byte as a command-line argument and find out what happens. In practice this is almost never done for reasons you've outlined. It is, however, quite common on things like STDIN because it's often a better line-delimiter than other characters (e.g. `find -print0` for `xargs -0`) – tadman Aug 16 '18 at 04:24
  • I'm going to speculate that passing in either a NUL byte early, or passing in data without a trailing NUL character is going to be impossible by design because that would create severe buffer-overflow exploitation opportunities for a wide range of programs. – tadman Aug 16 '18 at 04:28
  • Strings are always null-terminated; [that is part of the definition of a string in C](https://port70.net/~nsz/c/c11/n1570.html#7.1.1p1). – ad absurdum Aug 16 '18 at 04:28
  • @DavidBowling Strings are *supposed* to be terminated but that's not necessarily the case in practice. Mistakes happen. – tadman Aug 16 '18 at 04:29
  • @tadman -- OP said: "But sometimes strings aren't null terminated...." The point is, a character sequence that is _not_ null-terminated isn't a string in C. – ad absurdum Aug 16 '18 at 04:31
  • Right, mistakes happen, like when you use `strncpy`, which doesn't leave room for a terminal NUL byte if the string is bigger than `n`. – xordspar0 Aug 16 '18 at 04:33
  • @DavidBowling It may not be a *valid* string, but it may be the result of someone's code where they *thought* they were making a valid string. This happens way more often than it should because people don't use the right string copying functions. This question pertains to "what if the string *isn't* valid", not if it meets the technical definition of a C string. – tadman Aug 16 '18 at 04:34
  • 1
    The system requires the strings to be null-terminated in the call to `execve()` or equivalent. If they aren't, the call will fail. The strings passed to `main()` will be OK barring a catastrophic and incredibly impossible bug in the o/s. – Jonathan Leffler Aug 16 '18 at 04:37
  • 1
    @tadman -- OP seems to have some misconceptions about strings in C: "you can't know for certain the length of a string in C unless it is declared as a `char[]` in local scope," "sometimes strings aren't null terminated." I think that it is good to attempt to clarify in these situations. Yes, there are valid reasons to handle such malformed input, but "in my own code, I should always pass the length of the string around..." is not right; you should make sure that you pass around valid values in your own code, including valid strings when applicable. – ad absurdum Aug 16 '18 at 04:41
  • @DavidBowling I'm not sure I follow what you're saying here. I know C leads to pedantic discussions, but this one does not appear to be productive. – tadman Aug 16 '18 at 04:42
  • @tadman -- I'm not trying to be argumentative or counterproductive, but I think that a better understanding of what strings are in C would lead OP to see that "always pass the length of the string around" is not a good solution. – ad absurdum Aug 16 '18 at 04:46
  • @DavidBowling Try and stay focused on the core of the question: Can non-NUL terminated strings (e.g. invalid strings) be passed in via the shell or some other mechanism to your `main()` function, and if so, what are you supposed to do about it? – tadman Aug 16 '18 at 04:46
  • @tadman -- these are comments, not answers, and I was commenting on what I view as a significant misunderstanding of OP, clarification of which I hoped would be helpful to OP. Bye. – ad absurdum Aug 16 '18 at 04:48
  • While unlikely that the OS would pass such arguments to a new executable, possible source for bugs is to assume that `argc` is at least 1. There need not be `argv[0]`/the program name at all - though then that would be set to `NULL` and still not leak memory. – Antti Haapala -- Слава Україні Aug 16 '18 at 04:58
  • 1
    Thanks for the discussion about C strings and `execve()`. I'm a C novice, so it was helpful. @DavidBowling I think I understand how C strings work, but I'm definitely new. How could I clear up the misconceptions in my question? Regarding the "passing the length around", I got that idea from my research about `strncpy` in [this email](https://www.sourceware.org/ml/libc-alpha/2000-08/msg00053.html). – xordspar0 Aug 16 '18 at 06:39
  • @xordspar0 -- "you can't know for certain the length of a string in C unless it is declared as a `char[]` in local scope" seems to be rooted in misconceptions. You can _always_ know the length of a string in C, because strings are always null-terminated. Further, whether an array of `char` holding a string is declared locally or not is immaterial. If the string is passed to a function, you can't use `sizeof` to determine the _size of the array_, which is not the same thing as the _length of the string_. The array will _always_ be larger than the string (by at least one byte, but often more). – ad absurdum Aug 16 '18 at 22:07

3 Answers3

4

Simple answer: no.

You have to consider that arguments to main(int argv, char *argv[]) are always valid.

Joël Hecht
  • 1,766
  • 1
  • 17
  • 18
3

The software that starts a C program is responsible for creating proper contents for argv:

  • Per C 2018 5.1.2.2.1 2, “If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, which are given implementation-defined values by the host environment prior to program startup.”

  • Per 7.1.1 1, “A string is a contiguous sequence of characters terminated by and including the first null character.”

Thus, it is not legal according to the C standard for argv to point to sequences of bytes that are not null-terminated. It is possible? Yes, if there is a bug in the software, it is possible. A bug in bash cannot cause this, as bash works through the operating system, and bash would not be able to pass arguments to your program that the operating system does not process. Nor could other user-mode programs cause this, as they have to work through the operating system in the same way. It would require a bug in the code that loads and executes programs and/or the code inside a C program that starts the program before calling main.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

No, argv is an array of strings and it skips any spaces you may place in between them, so any input you may give would be valid, but obviously you want to check the input before you try to use it.

Ido H Levi
  • 181
  • 9
  • And how would you _check the input_ without accessing memory that is not part of the strings in `argv` in the case they are not null-terminated? – PiCTo Aug 16 '18 at 08:22
  • What do you mean? everything in the command line is going as parameters to `argv`, if the param is `NULL` - then it means that you've tried to access a parameter you haven't entered (`i > argc`)/ – Ido H Levi Aug 16 '18 at 08:53
  • If you access past the end of an array, you don't get NULL in C. You access memory that wasn't a part of your array, which at best causes you to get nonsense data and at worst causes a segmentation fault. Take a look at this example: https://repl.it/repls/DemandingWhimsicalMice – xordspar0 Aug 16 '18 at 14:09
  • @xordspar0 -- `argv[]` is an array of pointers to strings, [and it is guaranteed that these pointers are followed by a null pointer in the array](https://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1p2). So you can always check for a null pointer to know when you have reached the end of `argv[]`. – ad absurdum Sep 13 '18 at 22:18