Finding the Mode in a Vector of Floats

Question

I am trying to find the mode average in a vector containing 324 float values.

The code I have is as follows:

float max = vec.back();
float prev = max;
float mode = 0.0;
int maxcount = 0;
int currcount = 0;

for (const auto n : vec) {
    if (n == prev) {
        ++currcount;
        if (currcount > maxcount) {
            maxcount = currcount;
            mode = n;
        }
    } else {
        currcount = 1;
    }
    prev = n;
}

std::cout << mode << std::endl

This prints out the mode to be 0.75, which is wrong.

Here are all the float values, they come from a txt file so please excuse the format:

0.61 0.61 0.61 0.62 0.62 0.62 0.62 0.62 0.62 0.62 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.63 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.65 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.66 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.69 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.71 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.73 0.74 0.74 0.74 0.74 0.74 0.74 0.74 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.76 0.77 0.77 0.77 0.77 0.77 0.77 0.77 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.78 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79

Excel presents the mode as 0.65. Why does my code not produce the same result? What do I need to change?

Many thanks.

edit: I have found through debugging the values within vec are more like; 0.68000000000000005, 0.69999999999999996, though some are still only two decimal points (0.64, 0.74 etc). Could this be the issue? Am I able to round up the values for this particular calc?

Are you sure the values noted as 0.75 are really 0.75 or are they 0.749999999934 and 0.7500000012 or 0.7500000034 ? Same for 0.65. Its normally not a good idea to use == on a float. — Andreas H., Jan 23 '18 at 17:54
I would either add a breakpoint or print statement in the "if (currcount > maxcount) block to track when you've updated mode and what you've updated it to. And how positive are you that you're reading the text file correctly? — mrjaso, Jan 23 '18 at 17:56
You're calculating mode wrong. You are assuming the numbers will always be sequential, but mode doesn't take order into account (although it may work for this particular data set). You should use a map (or unordered_map) to track value and count. — Mobile Ben, Jan 23 '18 at 17:56
usually I trust my own code more than excel, but thats not so much my self confidence but rather my inexplicable aversion to excel ;) — 463035818_is_not_an_ai, Jan 23 '18 at 17:56
I get 0.65 [with your code](https://ideone.com/TyV7r8) and that data after adding a reader. Are you sure that you're reading the correct data? — molbdnilo, Jan 23 '18 at 18:12
Thanks for your comments. I have found through debugging the values within vec are more like; 0.68000000000000005, 0.69999999999999996, though some are still only two decimal points (0.64). Could this be the issue? Am I able to round up the values for this particular calc? — Neo, Jan 24 '18 at 15:41
Re your edit: Those are the floats closest to the numbers in your input. It doesn't matter since 0.68 will always be represented as `0.68000000000000005`, 0.7 as `0.69999999999999996`, and so on. They may not be exactly the numbers you input, but there will not be any variation in the approximations. — molbdnilo, Jan 26 '18 at 09:13

score 0 · Answer 1 · answered Jan 23 '18 at 18:19

The problem might be the use of floats for comparison. Because of how they are stored, floating point numbers differ, in general, from the value they are initialized to by a small amount.

Instead of using n == prev, consider a comparison within some small epsilon that is greater than the machine precision (for any machine you expect to run this code on) but less than the smallest true difference between any of your two numbers (which looks like 0.01). So you could do

if (((n - prev) < EPSILON) && ((prev - n) < EPSILON)) { ...`

for float EPSILON = 0.000001, or a value that makes sense for you. See also this question on comparing floats. Of note is that the ideal epsilon would change if your data set changed to much larger or much smaller numbers.

Even if there is another problem in your code, you might consider moving away from comparing floats in general.

Thanks for your comment. As mentioned above, I have found through debugging the values within vec are more like; 0.68000000000000005, 0.69999999999999996, though some are still only two decimal points (0.64). Could this be the issue? Am I able to round up the values for this particular calc? — Neo, Jan 24 '18 at 15:44
I don't think there's a way to functionally round a float. You can change how it's printed, but that doesn't change how it's stored (or, therefore, how the comparison happens). The important thing is whether each value is consistent (e.g., whether `0.68000000000000005 == 0.68000000000000005` returns `true`). But, as others have pointed out, it could be a problem with how your data are read in. — easybeso, Jan 24 '18 at 16:29

score 0 · Answer 2 · answered Jan 25 '18 at 02:41

By debugging I found that my values were not just two decimal place values, therefore, the mean average was actually 0.7500000000004, but was still being printed as 0.75.

By adding a rounding function call, and removing the const I was able to find the mean to two decimal places.

 for (auto n : vec)
    {
        n = roundf(n * 100) / 100;

        if (n == prev)
        {
            ++currcount;
            if (currcount > maxcount)
            {

                maxcount = currcount;
                mode = n;

            }
        } else
        {
            currcount = 1;
        }
        prev = n;

    }

Finding the Mode in a Vector of Floats

2 Answers2