0

I am trying to bin a raw data to make a histogram. The raw data has been saved in an array named data[k] (please refer to the code below). I have specified bins of some fixed width 0.01 and the upper boundary values of the intervals have been stored in an array called z[i]. The binning of the data has been done by counting the number of datapoints in each interval between z[i] and z[i+1]. For this problem, I have made 30 intervals starting from 0 to 0.3, the width of the interval being 0.01.

//creating bins in z

zmin = 0.01;
for(i=0;i<30;i++){
                z[i] = 0.0;
     }

for(i=0;i<30;i++){
                z[i] = zmin+i*zmin;

//binning the data

         for(i=1;i<30;i++){
                for (k=0;k<100000;k++){
                       if(data[k]>z[i-1] && data[k]<=z[i]){
                           bincount[i] += 1;
                          } //if
                    } //k loop
             } //i loop

         for (k=0;k<100000;k++){
                       if(data[k]<=z[0]){
                           bincount[0] += 1;
                          } //if
                    } //k loop

The elements of z[i] are:

z[0] = 0.01, z[1] = 0.02, and so on...

The binning produces accurate results, however for some reason the bincount for the interval between z[5] (=0.06) and z[6] (0.07) comes out to be 0, even though the actual count in my original data is non-zero. Similarly the bincount for the interval between z[6] (=0.07) and z[7] (=0.08) gives an erroneous result that is the total count of the two intervals mentioned above. However, when I write 0.07 within the if-statement instead of z[6] (which I tried separately), it gives the correct result.

I have also verified if the array z[i] stores the values correctly, which seems fine. Hence, I am confused as to why this problem arises only for the particular intervals with boundary z[6], while the other bins are giving correct results. Am I doing something wrong here?

  • 4
    Given the approximate nature of floating point, `0.01+6*0.01` is not necessarily the same as `0.07`. – Nate Eldredge May 02 '19 at 12:49
  • Welcome to Stack Overflow! We need to see a [mcve] for this problem. Also: http://idownvotedbecau.se/nomcve/ – Sourav Ghosh May 02 '19 at 12:50
  • 1
    What is `data` and `z` *really*? How are they declared? And floating-point arithmetic *will* lead to rounding errors (see [Is floating point math broken?](https://stackoverflow.com/questions/588004/is-floating-point-math-broken)). – Some programmer dude May 02 '19 at 12:50
  • Need to see the data. Seems like a floating point representation issue. – nicomp May 02 '19 at 12:50
  • 1
    That's a really inefficient way to bin your data into fixed-size bins. Far better would be to compute bin numbers directly from the data, with something like `bin_num = (int) (data[k] * 100)`. You will want to watch for and handle data that fall outside any bin, but you then need only one loop, not a nest, not two separate ones, and it will run about 30 times faster (for 30 bins). – John Bollinger May 02 '19 at 13:43
  • Thank you everyone! Yes it was a floating point issue. The code is running well now. – Sayantani_infty May 03 '19 at 07:10

0 Answers0