0

I recently found a piece of code online which looked a little like this:

#include <stdio.h>
#include <string.h>

int main() {
    float m[10];
    memset(m, 0, 20); 
}

I also saw a snippet where it was like this which I believe to be correct:

memset(m, 0, sizeof m); 

When trying to print out all the values of the first example using this snippet:

for (int i = 0; i < 20; i++) {
    printf("%f, \n", m[i]);
}

It produces an output like this:

0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
-0.000000, 
-4587372491414098149376.000000, 
-0.000013, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000, 
0.000000

Where the values change on recompilation.

Now I have a few questions:

  • Why can memset write to a float-array more than what was allocated and why can't you do that with a char-array?
  • Why is it so inconsistent?
  • Why does changing the second value to value of memset to 1 for example not change the output?
gurkensaas
  • 793
  • 1
  • 5
  • 29
  • 12
    There are only 10 floats in the array. Your snippet tries to print 20. – user3386109 Nov 23 '21 at 21:00
  • @user3386109 Is there a documented reason why trying to do this doesn't produce an error? – gurkensaas Nov 23 '21 at 21:01
  • 11
    @gurkensaas The documented reason is that accessing an array outside of its bounds has undefined behavior. – Eugene Sh. Nov 23 '21 at 21:02
  • 1
    Does this answer your question? [Array index out of bound behavior](https://stackoverflow.com/questions/671703/array-index-out-of-bound-behavior) – kaylum Nov 23 '21 at 21:25
  • @kaylum Only one of them. – gurkensaas Nov 23 '21 at 21:34
  • "*Why can memset write to a float-array more than what was allocated*". memset will attempt to write as much as you tell it to. The result may be that it overflows a buffer but it will do whatever you tell it to regardless of the result. Anyway, there is nothing in your code that shows it did that. It wrote exactly `sizeof m` bytes which is the size of the array. So not sure how you are reaching that conclusion. It's your print code that is wrong as already pointed out to you multiple times. – kaylum Nov 23 '21 at 21:37
  • @kaylum Sorry for the confusion, If you look at the history of the edits you can see that it was originally `memset(m, 0, 20);`. I also learned that floats usually take 4 bytes and so the memset wasn't technically out of bounds. – gurkensaas Nov 23 '21 at 21:40
  • I see. Then you should make the question consistent by removing that first question (or adding back the incorrect code). – kaylum Nov 23 '21 at 21:44
  • 1
    @gurkensaas *Is there a documented reason why this doesn't produce an error?* In C, exceeding the bounds of an array is kind of like jaywalking on a busy street. Almost anything may happen, but nothing is guaranteed. Often, you'll get away with it. Occasionally, a policeman may write you a ticket (although this is unusual, just as "array bounds exceeded" is an unusual error message in C). Occasionally, something terrible will happen, like you'll get run over by a semi. – Steve Summit Nov 23 '21 at 22:14
  • 1
    The only sensible move (to paraphrase the computer in [*WarGames*](https://en.wikipedia.org/wiki/WarGames)) is not to play at all -- that is, *don't* exceed the bounds of your arrays! – Steve Summit Nov 23 '21 at 22:15

2 Answers2

4

Why can memset write to a float-array more than what was allocated and why can't you do that with a char-array?

memset(m, 0, 20);, as the question originally showed, does not write more than was allocated. Commonly, float is four bytes in C implementations, so float m[10]; allocates 40 bytes, and memset(m, 0, 20); writes 20.

In the new code, memset(m, 0, sizeof m); writes just as many bytes to m as it has, no fewer and no more.

If memset were asked to write more, the reason you can try to do that is C implementations generally do not safety check operations, and the C standard does not require them to.

Why is it so inconsistent?

There is nothing inconsistent. memset wrote zeros to the first 20 bytes of m, and that is the encoding for floating-point zero, in the format commonly used for float (IEEE-754 binary32, also called “single precision”).

The bytes after that were not written, so printing them uses uninitialized data. The C standard says the values of uninitialized objects are not determined. A common result is the program uses whatever happened to be in the memory already. That may be zeros, or it may be something else.

However, with the loop for (int i = 0; i < 20; i++), you go beyond the 10 elements that are in m. Then the behavior of accessing m[i] is not defined by the C standard. As above, a common result is the program accesses the calculated memory and uses whatever happens to be there. However, a variety of other behaviors are also possible, including crashing due to an attempt to access unmapped memory or the compiler replacing the undefined code with alternate code during optimization.

Why does changing the second value of memset not change the output?

It will, depending on what you change it to. Some values for the byte may result in float values that are so small they are still printed as “0.000000”. For example, if bytes are set to 1, making the 32 bits 0x01010101 in each float, they represent a float value of 8,454,401•2− 148 = 2.36942782761723955384693006253917004604239556833255136345174597127722672385008451101384707726538181304931640625•10−38.

If you use 64 for the second argument to memset, the bits will be set to 0x40404040, which encodes the value 3.0039215087890625, so “3.003922” will be printed.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Sorry Eric, a `float` has only a bit more than 7 significant digits in common implementations today, so printing that number you have printed is nonsense.... and no use at all (undefined behaviour) – Luis Colorado Nov 25 '21 at 07:41
  • 1
    @LuisColorado: Actually, I accidentally left out a few digits. The format commonly used for `float`, IEEE-754 binary32, does not have decimal digits at all. It is binary based. The represented value is **exactly** as I have shown it per C 2018 5.2.4.2.2 3 and IEEE-754 2008 3.4. Even in C implementations with low-quality binary-to-decimal conversion routines (i.e., bad `printf` implementations), that exact value is what controls the arithmetic performed when the `float` is used in calculations, so it is what should be used in analyzing and understanding floating-point computations. – Eric Postpischil Nov 25 '21 at 11:14
  • 1
    @LuisColorado: This is a common misunderstanding of floating-point arithmetic. In floating-point arithmetic, the data represent specific numbers (including infinities) exactly, except for NaNs. They are not approximations. Rather, floating-point operations are defined to approximate real arithmetic. When an operating is performed, a rounding is performed. Understanding that it is the operations that perform approximations, not the numbers, is crucial to understanding, designing, analyzing, writing proofs about, and debugging floating-point arithmetic. – Eric Postpischil Nov 25 '21 at 11:19
  • Yeah... I understand that, but there's no sense in showing more digits than there are significative (despite glibc does upon request, I think this is what you have done) because as you say, there's equal chance for any of these extra digits to have any value. – Luis Colorado Nov 26 '21 at 12:28
  • @LuisColorado: No, there is not an equal chance for the digits to have any value. They are deterministic; the probability they have the value specified by the IEEE-754 standard is 1, and the probability they have some other value is 0. The digits are meaningful. – Eric Postpischil Nov 26 '21 at 12:38
  • no... I don't mean that... what I mean is that being unknown, you should not use them... because the probability of any digit (more after the first unkown) to be the correct one (I'm somehow applying Bayes here) is the same for them all (0, except for the correct one, in case you knew it) – Luis Colorado Nov 26 '21 at 12:51
  • What is true, is that, once you are in the unknown part of the number, the approach followed by glib, passes by assuming all next binary digits as 0, which is fake and aims students to believe that somehow the number they are using have more precision than the one promised by the IEEE-754 document (which is 15-16 full digits for the `double` 64bit floating point type) – Luis Colorado Nov 26 '21 at 12:55
  • @LuisColorado: They are not unknown, and there is no probability involved. There is no “unknown part of the number.” It would be correct to treat the number as if the significand continued with 0 digits after its actual end, as that results in the same value as the one specified by the IEEE-754 standard. IEEE-754 does not treat its binary64 type as having “15-16 full digits”. The values are defined as real numbers with the formula given in IEEE-754 2008 3.4, and these specify the values **exactly**; they have “infinite precision.” – Eric Postpischil Nov 26 '21 at 12:58
  • Oh, sorry, I didn't realized that you used the exact value of the minimum normalized value of 2^(-125). My apologies for that. I realized when I saw the last three digits. (that rotate between ...125, and ...625 in turn) By the way, the numbers are indeed rationals, as all of them can be written as the ratio of two integers. – Luis Colorado Nov 26 '21 at 13:04
0

Memset is doing nothing to a float array. You just had an array of 10 floats and you use a loop that is covering 20, this means that, as C doesn't check for array bounds... you have extended far from the end of the array and interpreting as float things that are not. I will not repeat here what has already been said in other answers, but you have a problem on sizing C objects. The sizeof operator is your friend here. Use it when passing arguments to functions like memcpy or malloc instead of using constants, so in case you resize the object, you will still be ok, without having to change the constant value.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31