Python using 'in' to search keys not working correctly for me

Question

I have to simulate a hyper exponential distribution. I created this function that simulates it and stores the result in a dictionary used as histogram. Then it saves the histogram to a CSV file to see it in a spreadsheet (like Excel). It also returns the histogram.

import numpy as np

def simulate_hyper_exponential_distribution(p=0.617066, lambda1=0.051, lambda2=0.052, iterations=10 ** 5):
    n = 0  # type: int
    histogram = {}  # type: dict[float, int]

    for n in xrange(0, iterations):
        lambda_used = lambda1
        if np.random.uniform() >= p:
            lambda_used = lambda2

        random = float(round(np.random.exponential(1. / lambda_used), 1))
        if random not in histogram:
            histogram[random] = 1
        else:
            histogram[random] += 1

    if sum(histogram.values()) != iterations:
        print "Error!"
        return

    file = open('C:\\Users\\SteveB\\Desktop\\test_histogram.csv', 'w')

    max_histogram_key = max(histogram.keys()) + 0.1  # type: int

    # I think the error is in this for
    for current in np.arange(0, max_histogram_key, 0.1):
        if float(current) in histogram: # I think this is the line that fails
            file.write(str(current) + ',' + str(histogram[current]) + '\n')
        else:
            file.write(str(current) + ',0\n')

    file.close()
    print 'Finished!'
    return histogram

I run it with this line:

histogram = simulate_hyper_exponential_distribution(0.617066, 0.051, 0.052)

My problem is that the resultant CVS file has certain values in 0, and I know that those values don't have a 0. And most interesting is that through different executions the same values are the ones that are wrong in the file (i.e. 0.3, 0.6, 0.7, 1.2, 1.4, 1.7, 1.9, 2.3, 2.4). I type histogram[0.3] (or any of the previously listed values) and I get a value different than 0.

For now I multiplied by 10 the key value and stored as int in the dictionary, and later, when writing this value in the file, divided it by 10, and this approach works. I don't know where the problem is when using float values. Thanks for your help.

I don't think you can reliably compare float values for equality. Since they are by design approximations. Perhaps `in` is the wrong choice as it is implicitly using the exact definition of equality. You may have to write a function that explicitly loops and allows a small variance when comparing floating point values. [This](http://stackoverflow.com/questions/5595425/what-is-the-best-way-to-compare-floats-for-almost-equality-in-python) post may be of interest. — Paul Rooney, Mar 30 '17 at 03:49
why don't you put your list in string so instead of comparing float, you check for a string in a list of strings ? — Mayeul sgc, Mar 30 '17 at 03:59
@PaulRooney Thank you. I thought of it, I printed the keys to see if some were 0.29999, 1.7111 or something like that. It didn´t happen, so I thought it wasn´t the case. I'll use another type for the keys, thanks! — Steve B., Mar 30 '17 at 06:32
@Mayeulsgc I think I'll stick with int, I think it's faster than string. Thanks! — Steve B., Mar 30 '17 at 06:38

Python using 'in' to search keys not working correctly for me

0 Answers0