1

I have two files, User.h and test.c:

User.h

include <stdio.h>                                                              
#include <string.h>                                                             

struct User {                                                                 
    char name[21];                                                              
};                                                                              

struct User newUser(char* name) {                                           
    struct User newUser;                                                    
    memset(newUser.name, '\0', 21); // ensure string ends with '\0'           
    memcpy(newUser.name, name, 20); // copy first 20 chars of string          
    return newUser;                                                           
} 

test.c

#include "User.h"                                                             

int main() {                                                                    
    struct User testUser = newUser(34);                                           
    printf("name is: %s\n", testUser.name);                                           
    return 0;                                                                   
} 

I am intentionally passing the wrong type of argument to a function, but trying to avoid a segfault:

If I were to pass a string of any length to newUser(), this function would take up to 20 characters of it and store it in .name without a segfault because it is guaranteed to end with a null byte.

Here I am passing an int instead. Because it is expecting a string, I get a compiler warning, but it compiles anyway. When run, I get a segmentation fault.

I suppose the segmentation fault occurs when the function reads past the name[21] array, but if it's guaranteed to have a null byte, why does it continue to read past it? It's expecting a string, shouldn't it treat any argument like a string and terminate at '\0'?

It seems my logic is flawed, can someone educate me about what's really going on here?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
ridthyself
  • 821
  • 1
  • 8
  • 19
  • 1
    C has no run time type checks. And this means you can not avoid the segmentation fault. But you can catch it. See [here](http://stackoverflow.com/a/32799720/402322) how to do it. Or use a [language](http://www.call-cc.org/), which has run time type checks. – ceving Sep 02 '16 at 16:05
  • I've been wanting to get into a lisp, Chicken Scheme looks amazing! thank you for this reference. – ridthyself Sep 02 '16 at 16:12

3 Answers3

4

I am intentionally passing the wrong type of argument to a function, but trying to avoid a segfault.

That is same as saying I'm going into the sea but trying to avoid getting wet.

When you do something illegal, all you can end up with is invoking undefined behavior which may lead to segfault.

The best way to avoid it is to write correct code.


The problem is, the function expects a char* and you're passing an int. That's not allowed, anyway. It's wrong, and you must not ignore compiler warnings.

To elaborate, the function expects a pointer-to-char (char*) type and further, the code involves reading from the address location pointed by the pointer. As you're passing an int to the function (ignoring compiler warnings), the code tries to access the memory pointed by the supplied integer value, which is very likely an invalid memory location. So, this attempt to access invalid memory location invokes the UB.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • Ok, so this triggers Undefined Behavior? So all bets are off, it's implementation specific and anything can happen? – ridthyself Sep 02 '16 at 15:52
  • @ridthyself Well, technically it's undefined, so I would not use the term "implementation specific" because officially that carries another meaning. and yes, once you hit UB, anything can happen, just anything. – Sourav Ghosh Sep 02 '16 at 15:54
  • thank you! This is helpful. Could you move your comment to your answer, maybe elaborate on how this causes "UB" and I'll select it. – ridthyself Sep 02 '16 at 15:56
  • @ridthyself any better now? :) – Sourav Ghosh Sep 02 '16 at 16:01
4

The standard C answer is that this code has undefined behavior so all bets are off. (Pass -Werror to treat all compiler warnings as errors and pass -pedantic to get all diagnostics required by the standard.) As pointed out by Keith Thompson, the C standard actually requires a diagnostic message for this (broken) code, and a compiler may refuse to compile it.

In practice the code probably reinterprets the number 34 as a memory address and then memcpy tries to read from (char *)34. This normally causes a segmentation fault because that address falls within the first page of memory, which isn't mapped to detect when someone dereferences a null pointer.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • Do you mean 34 might get interpreted as a memory address? – ridthyself Sep 02 '16 at 15:58
  • 1
    And add `-pedantic-errors` and `-Wall` to `-Werror`. Type checking is done at compile time in C. If you do not enable type checks at compile time you have no type checks. – ceving Sep 02 '16 at 16:01
  • 1
    @ridthyself Yes, that's probably what happens. 34 probably gets interpreted as a virtual memory address and the first page of virtual memory is left unmapped to catch accidental dereferences of null pointers. – David Schwartz Sep 02 '16 at 16:07
  • 1
    It's not merely undefined behavior. It's a constraint violation. A compiler could (and IMHO should) fail to compile `test.c`. If it successfully compiles it anyway (after issuing the mandatory diagnostic message), *then* the behavior of the resulting code is undefined. – Keith Thompson Sep 02 '16 at 17:37
  • @ridthyself Yes, fixed now. – melpomene Sep 02 '16 at 18:29
2
struct User newUser(char* name)
...
struct User testUser = newUser(34);

All the relevant declarations are visible at the point of the incorrect call. Your newUser function requires an argument of type char*, and you're passing it an int. This is a constraint violation, which means that any conforming compiler is required to issue a diagnostic message.

Unfortunately (but legally), some compilers will issue a non-fatal warning for this particular error, at least in their default mode.

The solution is to invoke your compiler in a mode that treats this as a fatal error. For example, if you're using gcc or clang, you can add one or more command-line options, such as -Werror or -pedantic-errors.

If you choose to ignore any errors (which you really shouldn't do), there is no good way for your program to avoid the problem. The only solution is to avoid writing the invalid call in the first place (and use your compiler's diagnostic to help you do that).

Having said that, there are some unrelated problems in your code.

struct User newUser(char* name)

Since the function doesn't modify the data that name points to, the parameter should be defined as const char *name.

memset(newUser.name, '\0', 21);

21 is a magic number. There's nothing to tell the reader why 21 is the correct number of bytes, and if you later decide to change the length of name, you'll have to manually update all references to it. Define a constant and use it.

memcpy(newUser.name, name, 20);

What if the string is less than 20 characters long? You want to copy no more than 21 bytes and no more than the length of the argument into newUser.name. (Doing this is more complicated than it should be strncpy is the obvious solution, but it's also almost certainly the wrong solution. See [my rant on the topic]https://the-flat-trantor-society.blogspot.com/2012/03/no-strncpy-is-not-safer-strcpy.html)).

You have a function definition in a header file. Don't do that. Functions should be declared in .h files and defined in .c files. You can get away with defining a function in a .h file if that header is included only once in your entire program. For larger programs, that's not going to be the case. (There's a lot to say about how to structure multi-file C programs, but that's beyond the scope of this answer.)

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Funny story, I actually ran into your article while trying to solve this problem, which is what led me to forgo strncpy and strcpy altogether and use memcpy instead, which, it turns out, is also the wrong answer. Go figure :) – ridthyself Sep 02 '16 at 16:21
  • What difference does adding const to a function signature make? Is that just for the programmers awareness? – ridthyself Sep 02 '16 at 16:28
  • @ridthyself: Without `const` on the parameter, you can't pass a pointer to `const` as an argument. For example, this: `void func(char *arg); /* ... */ const char *str = "hello"; func(str);` is invalid. – Keith Thompson Sep 02 '16 at 17:36