In Python, how can I check if 2 numbers in a list are within a certain percentage of each other?

Question

I have a large list of numbers, and I want to see if any of them are approximately equal. If 2 numbers are "approximately equal" (for my purposes), both of them fall within 10% of each other (see the following 2 examples.) Then I want to sort them into separate lists of approximately equal numbers.

Example #1 Compare 5.0 and 5.5: 5.5 +/- 10% = 4.95 to 6.05 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.5 is in this range) Therefore, 5.0 and 5.5 are approximately equal.

Example #2 Compare 5.0 and 5.6: 5.6 +/- 10% = 5.04 to 6.16 (and 5.0 is in this range) 5.0 +/- 10% = 4.50 to 5.50 (and 5.6 is in NOT this range) Therefore, 5.0 and 5.6 are NOT approximately equal.

Summary of what I need to do: Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3} Desired output = {4.0, 4.1, 4.2, 4.0, 4.3} and {9.0, 9.4, 8.9}

You may want to examine a k-means clustering algorithm. Be aware that convergence will be an issue for your problem. — dan, May 07 '13 at 22:49
What if all of the numbers are within 10% of another number such as [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, ... 99.9], how would decide the which numbers go into which group? — dansalmo, May 07 '13 at 23:14
@dansalmo Numbers that can go into more than one group will appear in all the groups that it can. So, if the input is {1.0, 1.1, 1.2}, the output will be 2 lists: {1.0, 1.1} and {1.1, 1.2}. — leon, May 07 '13 at 23:25
@zwerdlds Python. Thanks for pointing that out. I edited the post title. — leon, May 07 '13 at 23:26
Something I think is worth noting about the answers. They base the comparison 10% range on the smaller of the two numbers to compare. That works fine; you can ignore the range of the larger number since if the larger number falls within the smaller number's range, the smaller number always falls within the larger number's range. (It can be mathematically proven that no numbers exist where the smaller falls out of the larger's range and the larger falls in the smaller's range.) — jpmc26, May 08 '13 at 03:32
I'm mostly just curious, but in your last example, why isn't there a result set for {4.3, 4.2} not including 4.0? — , May 08 '13 at 14:58
@jpmc26 I feel stupid for not realizing that before I typed up the example. Thanks! — leon, May 09 '13 at 16:46
@zwerdlds 4.0 * 0.9 = 3.6, and 4.0 * 1.1 = 4.4, and 3.6 < 4.3 < 4.4, so 4.3 is within 10% of 4.0... — leon, May 09 '13 at 18:12
@leon Don't feel stupid. I didn't realize that myself, either, until I started working with the math. I had actually assumed you were aware of it since you didn't list it as a case you needed to cover. I just didn't think it was very obvious for anyone who stumbled across this question. — jpmc26, May 09 '13 at 18:33

supermitch · Accepted Answer · 2013-05-07T23:48:48.750

input_list = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]

results = {input_list[0]: [input_list[0]]}    # Start with first value
for value in input_list[1:]:         # loop through our entire list after first value
    hi = value * 1.1
    low = value * 0.9
    print("Value: {0}\tHi: {1}\tLow:{2}".format(value, hi, low))
    for existing in results:     # search through our result set
        found_similar = False
        if low < existing < hi:  # if we find a match
            results[existing].append(value)    # we add our value to the list for that set
            found_similar = True
            break
    if not found_similar:        # if we looped through our entire results without a match
        results[value] = [value] # Create a new entry in our results dictionary

for entry in results:
    print(results[entry])

Will give:

results = { 9.0: [9.0, 9.4, 8.9],
            4.0: [4.0, 4.1, 4.2, 4.0, 4.3] }

This code starts with the first value in your list, and finds all subsequent values that are within 10% of that one. So in your example, it starts with 4, and finds all similar values. Any value that isn't within 10 % get added to a new "set".

So once it reaches 9.0, it sees that it's not a match, so it adds a new result set to the results dictionary, with a key of 9.0. Now when it considers 9.4, it doesn't find a match in the 4.0 list, but it does find a match in the 9.0 list. So it adds this value to the second result set.

I think this'll do the trick. Thanks! I haven't tested it yet, but the logic seems good to me. — leon, May 09 '13 at 16:55
I don't think it addresses your subsequent comment about pairs appearing in multiple sets... but I thought that issue needed further definition, so I stuck with a very straightforward solution. Hopefully this is a good starting point. — supermitch, May 09 '13 at 19:29

dansalmo · Answer 2 · 2013-05-09T17:01:19.640

0

Here is a generator / set based method.

def set_gen(nums):
    for seed in sorted(nums):
        yield tuple([n for n in nums if seed <= n and n/seed <= 1.1])

def remove_subsets(sets):
    for s in sets.copy():
        [sets.remove(s2) for s2 in sets.difference([s]) if set(s2).issubset(s)]

>>> nums = [4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(9.0, 9.4, 8.9), (4.0, 4.1, 4.2, 4.0, 4.3)]

>>> nums =  [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
>>> x = set(num for num in set_gen(nums))
>>> remove_subsets(x)
>>> list(x)
[(1.9, 1.8), (1.5, 1.4), (1.4, 1.3), (1.2, 1.1), (1.7, 1.6), (1.5, 1.6), (1.3, 1.2), (1.9, 2.0), (1.0, 1.1), (1.8, 1.7)]

edited May 09 '13 at 17:01

answered May 08 '13 at 00:16

dansalmo

11,506
5
58
53

You have doubles in you result of the second example such as `(1.2,1.1)` and `(1.3,1.2)` and `(1.0,1.1)`. I think this is incorrect. – dawg May 08 '13 at 17:34
This is required for the solution to be correct. See leon's clarification to my comment under his original question. He says: Numbers that can go into more than one group will appear in all the groups that it can. So, if the input is {1.0, 1.1, 1.2}, the output will be 2 lists: {1.0, 1.1} and {1.1, 1.2} – dansalmo May 08 '13 at 19:06

dawg · Answer 3 · 2013-05-08T17:29:14.487

You could do this:

Input = {4.0, 4.1, 4.2, 4.0, 9.0, 9.4, 8.9, 4.3} 

wl=sorted(Input,reverse=True)
apr=.1
out={}
while wl:
    wn=wl.pop()
    out[wn]=[wn]
    while wl and wl[-1]<=wn*(1+apr):
        out[wn].append(wl.pop())

print [(k,out[k]) for k in sorted(out.keys())]

Prints:

[(4.0, [4.0, 4.1, 4.2, 4.3]), (8.9, [8.9, 9.0, 9.4])]

Trying the example in the comments:

>>> Input = {1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0}

Prints:

[(1.0, [1.0, 1.1]), (1.2, [1.2, 1.3]), (1.4, [1.4, 1.5]), (1.6, [1.6, 1.7]), (1.8, [1.8, 1.9]), (2.0, [2.0])]

In Python, how can I check if 2 numbers in a list are within a certain percentage of each other?

3 Answers3