0

Why does this work : I tell the program my char will have maximum 2 char, right ?

#include <iostream>
#include <string>

using namespace std;

int main() {

    char name[2];

    cout << "Please, enter your full name: " << endl;
    cin.getline(name, 100);
    cout << "Hello, " << name << "!\n";

    return 0;
}

When I enter Albert Einstein, it works perfect but there is here 15 char, how can they all enter in my variable which should have maximum 2 char ?

But with getline i tell him to associate to name, up to 100 char writted in this line.

And this doesn't work : I tell the program my char will have maximum 1 char, right ?

#include <iostream>
#include <string>

using namespace std;

int main() {

    char name[1];

    cout << "Please, enter your full name: " << endl;
    cin.getline(name, 100);
    cout << "Hello, " << name << "!\n";

    return 0;
}

When I enter Albert Einstein, it doesn't work, but seems logic as I create my variable name with maximum 1 char.

But with getline i tell him to associate to name, up to 100 char writted in this line.

What I really doesn't understand is why when I create it and tell 2 char, it works, when i tell 1 char it doesn't ?

Can someone explain me ?

Thank you

Kaj Half
  • 9
  • 3
  • 3
    This is undefined behaviour. You need to keep track of the bounds when using functions that takes a size argument. You'd probably be better off using `std::string name;` and `std::getline(std::cin, name);` instead. – Ted Lyngmo Feb 05 '19 at 11:07
  • Yeah I know how to resolve this , there is bunch of way to do it. – Kaj Half Feb 05 '19 at 11:10
  • But I was just wondering why with char[1] it doesn't work, and with char[2] it does. Thanks for your answer ! – Kaj Half Feb 05 '19 at 11:11
  • Like @TedLyngmo said, you are accessing out of bound so it is undefined behaviour. Anything can happen including randomly working – Xatyrian Feb 05 '19 at 11:12
  • @KajHalf As Ted wrote, it is undefined behaviour, which means there is no guarantee whatsoever that it will work in both cases. I just tested your code on my machine and it worked in both cases, but it might fail in both cases on another computer. – Leo Feb 05 '19 at 11:13
  • So there is no real reason to explain this ? I should always create my char name[100]; and when i want to use it with getline like here just use same number so i don't have any problem ? – Kaj Half Feb 05 '19 at 11:13
  • @Xatyrian Thanks everyone for your answer, understood ! – Kaj Half Feb 05 '19 at 11:15
  • The real reason is that you are writing somewhere that may be mapped to just about anything. It may be memory allocated for some of your other variables. Try allocating another `char []` right after your `name` and it may get written too. It may be writing to program memory. It may be mapped to some hardware that pushes the nuke button - for real :-) In short, what happens when you do like that is undefined. – Ted Lyngmo Feb 05 '19 at 11:25
  • It doesn't fit in in any case, you're overwriting memory that isn't yours, the issue is that usually there's nothing important there so when the memory is overwritten, it doesn't always crash. Note that this is more likely to succeed in windows because windows put memory buffers around unreliable applications to make them seem reliable. – Owl Feb 05 '19 at 11:39
  • 1
    "*usually there's nothing important there*" - hmm, like the function return address? – rustyx Feb 05 '19 at 12:16
  • @TedLyngmo you'd have to declare the other char[] before name for it to be written into. AFAIK arrays declared like that land on the stack. – FalcoGer Feb 05 '19 at 13:35
  • @FalcoGer Yes, but and how the stack grows depends on ABI and the order in which variables are place on the stack when declared like this depends on the compiler AFAIK (although usually in the order they were declared) - so, one would need to declare variables both before and after to check for the effect I was after. – Ted Lyngmo Feb 05 '19 at 14:29
  • @TedLyngmo The compiler may ommit unused variables or inline their values. The stack however is defined to start at a high address and grow downards in memory, new variables are added beneath that address in the order they are added in. Really, by adding new variables you are just moving the stack pointer and the compiler remembers the address offset from the stack pointer to put into the assembly code. – FalcoGer Feb 05 '19 at 14:32
  • 1
    @FalcoGer C++ does not require the stack to start at any specific address and stack growth can be in any direction. Here are [some examples](https://stackoverflow.com/a/664779/7582247). – Ted Lyngmo Feb 05 '19 at 14:51
  • @TedLyngmo I didn't know that. Thank you. But down seems to be the standard, even for the more exotic specimen on the list. And x86 and co. (amd, x64) make up most of what people use anyway. So generally speaking it's still down for the most part. Unless OP uses something funky. Also corrected my starting address thing. I seemed to have remembered something like that, but I was probably confusing it with something else. – FalcoGer Feb 05 '19 at 15:47
  • @FalcoGer You are welcome. "_down seems to be the standard_" ... I would not use the word _standard_ in this context since it implies something one should be able to rely upon, but yes, stacks most commonly grows downwards. :-) – Ted Lyngmo Feb 05 '19 at 16:12

1 Answers1

2

This doesn't work because your array has the size 2 bytes (2 * size of char). C and C++ do not check for array bounds. This results in you writing past the end of the array.

Because the stack (Please note the illustration on the wiki is upside down in regards to memory addresses) is built from high memory addresses to lower memory addresses but data is written from low to high you will most likely (depending on how your compiler works and it's settings, etc) write into other variables that you have declared before this point, the function parameters, the return pointer or even outside the valid scope of your programm past the end of the stack.

This is undefined behavior and depends heavily on the state of the stack and less so on the input that is written into the array. What will likely happen is a program crash due to the return pointer being overwritten and pointing to some random address which may or may not be inside your program or a segmentation fault if you try and write into protected memory that your program doesn't own.

In the worst case, you have a buffer overflow vulnerability in which an attacker may craft an input that overwrittes the function return address on the stack to make the program execution jump to an address desired by the attacker, usually the stack itself (the array that was just filled with the attackers data) and execute it. The attacker would fill more of the array with some processor instructions to do anything he wants. While modern operating systems prevent this primitive kind of overflow vulnerability it shouldn't happen to begin with. And there are easy workarounds for the more basic measures, such as address randomization.

An array is a number of memory cells that are located one after the other in memory. Each one can hold a character (or whatever the datatype is you have used.)

For example char name[30]; will tell your compiler to reserve 30 cells for characters. name will be treated as a pointer to a char object that has the address of the first cell in the array(name[0]). Please note that at this point, random data may be in those cells. When you then enter a string into those cells via cin.getline it may look like this:

['M', 'y', ' ', 'N', 'a', 'm', 'e', '\0', 'f', '&', '\0', '\0', 'y', '\0', ... 'i'] (30 characters total)

C++ and C recognize the end of a string by the first NULL value. Therefore you must at least have 1 more space in your array for this null value than your maximum expected character limit. In this case, the maximum length of the name is 29 characters because the null-terminator requires 1 extra space and this fills all 30 cells. It is a common fault to go out of bounds by 1 index in loops or by forgetting about the null-terminator. This is hard to debug as it rarely causes crashes but only subtle, most often unreproducable bugs by overwriting variables or part of them.

Fixed array sizes are bad practice for variable input lenghts. You should look into memory allocation and handling pointers for strings. Or you could use the conveniet std::string objects.

Another bad practice is to use magic numbers, like 30 in my example. it's better to define a constant with a strong name and use that instead. For example

const int MAX_NAME_LEN = 30;

If you want to do some experimentation you can declare an array and fill it with a pattern of data like [\xAB, \xAB, \xAB, \xAB, ...]. When your operating system tosses you the invalid jump address you will see the data pattern in the address of the error message (0xABABABAB) (if you just overwrote the return address on the stack, not written into protected memory).

FalcoGer
  • 2,278
  • 1
  • 12
  • 34