0

I always wonder why C manages the memory the way it does.

Take a look at the following codes:

int main(){
    int x = 10000000000;
    printf("%d", x);
}

Of course, overflow occurs and it returns the following number:

1410065408

Or:

int main(){
    int x = -10;
    printf("%u", x);
}

Here x is signed and I am using the unsigned keyword "%u"

Returns:

4294967286

Or take a look at this one:

int main(){
    char string_ = 'abc';
    printf("%d", string_);
}

This returns:

99

That being said, I mainly have two questions:

  1. Why the program returns these specific numbers for specific inputs? I don't think it is a simple malfunctioning because it produces the same result for the same input. So there is a deterministic way for it to calculate these numbers. What is going under the hood when I pass these obviously invalid numbers?
  2. Most of these problems occur because C is not a memory-safe language. Wikipedia says:

In general, memory safety can be safely assured using tracing garbage collection and the insertion of runtime checks on every memory access

Then besides historical reasons, why are some languages not memory-safe? Is there any advantage of not being memory-safe rather than being memory-safe?

goku
  • 167
  • 10
  • 3
    Your examples all have to do with overflow of specific *number*, not memory". – Steve Summit May 06 '22 at 14:51
  • Overflow occurs when there is not enough memory, how they are not related? @SteveSummit – goku May 06 '22 at 14:52
  • Integer overflow is formally *undefined*. On many systems, you get a result which is modulo the size of the variable. 10000000000 is a 34-bit number, so it won't fit in a 32-bit int, so the result you get is often 10000000000 % 4294967296, which is 1410065408. (But it's not guaranteed you'll get this result.) – Steve Summit May 06 '22 at 14:53
  • The advantage is that the compiler doesn't have to check whether your code is memory-safe. – user253751 May 06 '22 at 14:53
  • When you say `int x = 2147483600; x += 50;`, you do get an overflow of the value in `x`. But we don't say that the problem is that you're running out of memory. (You might say you ran out of *bits*.) Running out of memory would be `char *p = malloc(70000);` on a machine with only 64kB of memory. – Steve Summit May 06 '22 at 14:55
  • What about the other numbers? @SteveSummit – goku May 06 '22 at 14:57
  • 1
    `'abc'` is a *multi-character character constant*, and that's a different story. Those are imperfectly defined, too, and best avoided. But you might be interested in trying `int c = 'abc';`, and printing it back out, both using `%d` and `%x`. – Steve Summit May 06 '22 at 14:58
  • 1
    To explain the result with `x = -10` and `%u`, you can read about [two's complement arithmetic](https://en.wikipedia.org/wiki/Two%27s_complement). The short answer is that 2^32 is 4294967296, and 4294967296 - 10 = 4294967286. – Steve Summit May 06 '22 at 14:59
  • 1
    The other way to explain the result with 10000000000 is that 10000000000 in hexadecimal is `0x2540be400`, while 1410065408 is `0x540be400`. See the pattern? (But, again, this result is not guaranteed. On some machines, with integer overflow you'll get nonsensical results, or you'll get a similar kind of exception as if you divided by 0. The way to get guaranteed, "wraparound" behavior on overflow in C is to use `unsigned`.) – Steve Summit May 06 '22 at 15:02
  • Interesting results indeed. Can you produce non-ASCII output in this way too? I came across non-ASCII outputs a lot of time but I cannot remember how I did that. Mostly it was caused by scanf() (that is also not safe). I'm not sure if I can produce that without scanf @SteveSummit – goku May 06 '22 at 15:03
  • @goku You probably output in *UTF-8* although it depends which OS you are using. I don't think the Windows console uses UTF-8. – user253751 May 06 '22 at 15:04
  • I am using windows and I guess it is UTF-8. I also tried this on online compilers to check if I'll get different results. @user253751 – goku May 06 '22 at 15:08
  • "*Overflow occurs when there is not enough memory, how they are not related?*" You may be conflating *buffer overflow* and *arithmetic overflow*. Your examples have arithmetic overflow but not buffer overflow. Buffer overflow is related to accessing memory outside the bounds of an array object (or outside the bounds of an object being treated as an array object) or outside the bounds of a dynamically allocated object. – Ian Abbott May 06 '22 at 16:24
  • What I was thinking was this: when you type int x, it reserves 16 bits of memory, and when the result is, for example, 32 bit, there is not enough memory reserved for x hence overflow occurs. Is that wrong? @IanAbbott – goku May 06 '22 at 16:30
  • 1
    It is not wrong but it is arithmetic overflow and has nothing to do with memory safety. – Ian Abbott May 06 '22 at 16:41
  • Think of it like this. You live on a nice little street in the suburbs. You, and your 10 neighbors to your right, and your 10 neighbors to your left, all have more or less identical houses with more or less identical two-car garages. Buffer overflow is when you get an agreement from your neighbors to borrow all their houses for a big family reunion you're having, except that 100 of your relatives show up and try to park their cars in 20 garages and primp for the party in 20 bathrooms. – Steve Summit May 06 '22 at 19:30
  • Arithmetic overflow, on the other hand, is when your distant cousin shows up in his 18-wheeler, and tries to park it in one of those garages, and knocks out the back wall. – Steve Summit May 06 '22 at 19:30
  • Nice one, lol @SteveSummit – goku May 06 '22 at 19:35
  • 1
    Note that the `x = -10` example could also print for example 32778 if you have a 16 bit cpu with 1s-complement arithmatic, i.e. sign bit + absolute value. But CPUs with 1s-complement are so far out there and obsolete that the latest C++ standard now requires 2s-complement arithmetic. It's what everyone uses. – Goswin von Brederlow May 06 '22 at 21:14

1 Answers1

1

Of course, overflow occurs and it returns the following number:

There is no overflow in int x = 10000000000;. Overflow in the C standard is when the result of an operation is not representable in the type. However, in int x = 10000000000;, 10,000,000,000 is converted to type int, and this conversion is defined to produce an implementation-defined result (that is implicitly representable in int) or an implementation-defined result (C 2018 6.3.1.3 3). So there is no result that is not representable in int.

You did not say which C implementation you are using (particularly the compiler), so we cannot be sure what the implementation defines for this conversion. For a 32-bit int, it is common that an implementation wraps the number modulo 232. The remainder of 10,000,000,000 when divided by 232 is 1,410,065,408, and that matches the result you observed.

4294967286

In this case, you passed an int where printf expected an unsigned int. The C standard does not define the behavior, but a common result is that the bits of an int are reinterpreted as an unsigned int. When two’s complement is used for a 32-bit int value of −10, the bits are FFFFFFF616. When the bits of an unsigned int have that value, they represent 4,294,967,286, and that matches the result you observed.

char string_ = 'abc';

'abc' is a character constant with more than one character. Its value is implementation defined (C 2018 6.4.4.4 10). Again, since you did not tell us which implementation you are using, we cannot be sure what the definition is.

One behavior of such constants is that 'abc' will have the value ('a'*256 + 'b')*256 + 'c'. When ASCII is used, this is (97*256 + 98)*256 + 99 = 6,382,179. Then char string_ = 'abc'; converts this value to char. If char is unsigned and is eight bits, the C standard defines this to wrap modulo 2256 (C 2018 6.3.1.3 2). If it is signed, it is implementation-defined, and a common behavior is to wrap modulo 2256. With either of those two methods, the result is 99, as the remainder of 6,382,179 when divided by 256 is 99, and this matches the result you observed.

Most of these problems occur because C is not a memory-safe language.

None of the above has anything to do with memory safety. None of the constants or the conversions access memory, so they are not affected by memory safety.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • How there is no overflow int x? I am using C17 and it says: *warning: overflow in conversion from 'long long int' to 'int' changes value from '10000000000' to '1410065408'* – goku May 06 '22 at 15:15
  • @goku: The compiler message is wrong or is using terminology differently from the C standard. I cited the paragraph of the C standard that applies: The conversion is implementation-defined. – Eric Postpischil May 06 '22 at 15:20
  • gcc version: 8.1.0 – goku May 06 '22 at 15:24
  • Can you post the related link? – goku May 06 '22 at 15:24
  • 1
    @goku: I use an official version of the C standard that is not freely available. Links to free drafts and information about other versions of the C and C++ standards are maintained [here](https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents). – Eric Postpischil May 06 '22 at 16:02
  • It's not an integer overflow, it's a *conversion* overflow. The compiler is warning that the implementation defined conversion from '10000000000' to '1410065408' didn't go well. – Goswin von Brederlow May 06 '22 at 21:09
  • @GoswinvonBrederlow: It is not a conversion overflow because the input is in the defined range of the operation and the output is in the defined range of the operation. The operation is simply **defined** to produce a value within the range of `int`, so there is no overflow. Overflow occurs when some operation, say multiplication, is defined to produce a result that is outside the range of its output type. E.g., when a 32-bit int 2,000,000,000 is multiplied by 10, the defined result is the mathematical product, 20,000,000,000. But that cannot be represented in `int`, so there is overflow. – Eric Postpischil May 06 '22 at 21:16
  • @EricPostpischil overflow is what the compiler calls it in the conversion when the mathematical value in the source is converted to an integer value that is mathematically smaller. The overflow in the warning is coined by the compiler and not the overflow coined by the C standard. So the compiler warning is perfectly fine. It's just not referencing anything in the C standard. – Goswin von Brederlow May 06 '22 at 21:24
  • @GoswinvonBrederlow: As I noted, the compiler may be using terminology different from the C standard. I regard it as sloppy. The C standard does not use the term “overflow” this way. It is not “perfectly fine”; the compiler may not have a legal or contractual obligation to align its terminology with the C standard, but good, clear communication is important, and compiler developers should seek to align their terminology with the standard’s. – Eric Postpischil May 06 '22 at 21:26
  • 1
    I think it aligns perfectly. If you consider the conversion from the string `10000000000` to `int` to be a loop (pseudocode) `int i = 0; while(*s) { i *= 10 + (*s - '0'); }` then overflow in the c++ standard sense is exactly what happens. – Goswin von Brederlow May 06 '22 at 21:44