Remove duplicates and similar values from a list in Python?

Question

This question is a follow up on How do you remove duplicates from a list in whilst preserving order?.

I need to remove duplicates and or similar values from a list:

I start from that question's answer and apply:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [ x for x in seq if x not in seen and not seen_add(x)]

but when I apply it to my data/array:, I get this which is clearly wrong because the values in bold are equal and one should be removed

 [(Decimal('1.20149'), Decimal('1.25900')),
 *(Decimal('1.13583'), Decimal('1.07862'))*,
**(Decimal('1.07016'), Decimal('1.17773'))**,
 *(Decimal('1.13582'), Decimal('1.07863'))*,
  (Decimal('1.07375'), Decimal('0.92410')),
  (Decimal('1.01167'), Decimal('1.00900')),
**(Decimal('1.07015'), Decimal('1.17773'))**,
  (Decimal('0.95318'), Decimal('1.10171')),
  (Decimal('1.01507'), Decimal('0.79170')),
  (Decimal('0.95638'), Decimal('0.86445')),
  (Decimal('0.90109'), Decimal('0.94387')),
  (Decimal('0.84900'), Decimal('1.03060'))]

How would you remove those values which are identical ?

Your list seems to contain tuples of decimals, and I don't see identical tuples there. — interjay, Apr 22 '13 at 17:09
you should practice problem formulation. If you understand, how to consider two values identical, you will obtain the solution yourself. — newtover, Apr 22 '13 at 17:10
As @interjay says, none of the tuples are identical. If you also want to remove "similar values" as stated in the question, that would depend on your definition of "similar". — Aya, Apr 22 '13 at 17:24

score 3 · Accepted Answer · answered Apr 22 '13 at 17:09

3

From the output, it looks like the seq you're passing contains 2-tuples. While the values inside the tuples may be the same, the tuples themselves (which are the elements of your sequence) are not, and therefor are not removed.

If your intention is to get a flat list of the unique numbers, you can flatten it first:

seq = [ (1,2), (2,3), (1,4) ]
f7(itertools.chain(*seq))
=> [1, 2, 3, 4]

answered Apr 22 '13 at 17:09

shx2

61,779
13
130
153

Thank you very much but I think this partially answers the question. The key point here is that I have numbers which are produced by decimal().quantize() which are very similar albeit being different. What I want to do is remove a number if say, they coincide by 8 digits. Is this possible? – Oniropolo Apr 22 '13 at 17:20
In that case you need to change the part converting your floats to decimals. With the decimals you created, there's no way of telling whether they coincide by 8 digits, because precision is lost. – shx2 Apr 22 '13 at 17:25

Remove duplicates and similar values from a list in Python?

1 Answers1