removing string duplicates in a list in python

Question

I have a problem in my code where I have the following:

import random
Probabilities={'AA':0.2,"TT":0.2, "GG":0.1, "CC":0.1, "AT":0.4}
lst=[]
klist=[]
for i in Probabilities:
    lst.append(Probabilities[i])
lst.sort()
for i in lst:
    for j in Probabilities:
        if Probabilities[j]==i:
            klist.append(j)
jist=list(set(klist))
#klist.append(i)
cist=[]
cist.append(lst[0])
for i in range(1,len(lst)):
    k=lst[i]+cist[i-1]
    cist.append(k)
p=random.uniform(0, 1)
print (p)
print(lst)
print(cist)
print(klist)
print (jist)

When I run this I get something like

0.9939409413693211

[0.1, 0.1, 0.2, 0.2, 0.4]

[0.1, 0.2, 0.4, 0.6000000000000001, 1.0]

['CC', 'GG', 'CC', 'GG', 'TT', 'AA', 'TT', 'AA', 'AT']

['TT', 'AT', 'CC', 'AA', 'GG']

The part I need to fix is to change the last list printed to not only remove the duplicates, but keep the order of the previous list

So basically instead of

['TT', 'AT', 'CC', 'AA', 'GG']

I want

['CC', 'GG','TT', 'AA','AT']

when I do

jist=list(set(klist))

Thanks, A

PS. I am new to Stack Overflow, sorry for anything I may have not made clear/ improper etiqutte, etc.

score 1 · Answer 1 · answered Jul 10 '16 at 15:27

Sort jlist using a key defined by Probabilities. In this case, the key you want is something to this effect:

def strange_key(term):
    return Probabilities[term]

Then you can sort using the key as follows:

jlist.sort(key=strange_key)
jlist
>>> ['CC', 'GG', 'TT', 'AA', 'AT']

The key needs to be a function that returns some value for the given term. Since you already have a dictionary for this, you're set. This allows you to do other manipulation in the interim (which you may not need) and just sort at the very end.

123 · Answer 2 · 2016-07-10T15:28:18.340

0

You can do this while preserving order using Python's set() to keep track of elements in the list that have already been seen.

def removeDups(list):
    seen = set()
    newList = []
    for item in list:
        if item not in seen:
            seen.add(item)
            newList.append(item)
    return newList

edited Jul 10 '16 at 15:28

answered Jul 10 '16 at 15:20

123

8,733
14
57
99

score 0 · Answer 3 · answered Jul 10 '16 at 15:31

Try using a dictionary to see which elements you've already appended to the list, and only appending items that haven't already been appended. I think this will do what you're looking for.

import random

Probabilities={'AA':0.2,"TT":0.2, "GG":0.1, "CC":0.1, "AT":0.4}

lst=[]
klist=[]

for i in Probabilities:
    lst.append(Probabilities[i])

lst.sort()

for i in lst:
    for j in Probabilities:
        if Probabilities[j]==i:
            klist.append(j)

#jist=list(set(klist))
jist=[]
uniq = {}
for i in klist:
    # Check if we've seen i yet
    if i not in uniq:
        # Mark i as having been seen so we don't add it later
        uniq[i] = True
        jist.append(i)

cist=[]
cist.append(lst[0])
for i in range(1,len(lst)):
    k=lst[i]+cist[i-1]
    cist.append(k)
p=random.uniform(0, 1)

print (p)
print(lst)
print(cist)
print(klist)
print (jist)

score 0 · Answer 4 · answered Jul 10 '16 at 15:31

You can use the keys of a collections.OrderedDict:

In [90]: from collections import OrderedDict

In [91]: klist
Out[91]: ['CC', 'GG', 'CC', 'GG', 'TT', 'AA', 'TT', 'AA', 'AT']

In [92]: jist = list(OrderedDict.fromkeys(klist))

In [93]: jist
Out[93]: ['CC', 'GG', 'TT', 'AA', 'AT']

removing string duplicates in a list in python

4 Answers4