3

I have two strings:

StringA: ['K', 'T', 'T', 'T', 'K', 'K', 'G', 'T', 'T', 'T', 'T', 'K', 'K']

StringB: ['T', 'K', 'G', 'G', 'K', 'T', 'T', 'K', 'G', 'G', 'K', 'K', 'T']

And I want to test for how many unique combinations of letters there are. The strings are ordered, so I only want to match StringA position 1 with StringB position 1, StringA position 2 with StringB position 2, etc. So the pairs in the strings above are (KT), (TK), (TG), (TG), (KK), (KT), (GT), (TK), (TG), (TG), (TK), (KK), (KT).

And there are 5 unique combinations: (KT), (TK), (TG), (GT), (KK)

I have used the following code to produce the strings from two .csv files.

import sys
import csv

pairlist = open(sys.argv[1], 'r')
snp_file = open(sys.argv[2], 'r')

pair = csv.reader(pairlist, delimiter=',')
snps = csv.reader(snp_file, delimiter=',')

output = open(sys.argv[1]+"_FGT_Result", 'w')

snp1 = []
snp2 = []

firstpair = pair.next()

locusa = firstpair[0]
locusb = firstpair[1]

f = snps
        #search = snp.readlines()
for i, row in enumerate(f):
    if locusa in row:
        hita = row
        #print hita
        snp1.append(hita[2])
    if locusb in row:
        hitb = row
        #print hitb
        snp2.append(hitb[2])

print snp1
print snp2

pairlist.close()
snp_file.close()
output.close()

But I cannot figure out how to do the comparison. I have tried to convert the strings to sets, as I read in another thread, that this is required, but I am not sure why, and I cannot get it to work.

miradulo
  • 28,857
  • 6
  • 80
  • 93
Hjalte
  • 376
  • 5
  • 17
  • Look at the [zip](https://docs.python.org/3/library/functions.html#zip) function. That's half of what you need to do. – grayshirt Apr 09 '15 at 14:03

3 Answers3

3

Just use zip and set to combine the two lists of strings and get unique combinations. I used a list comprehension to return combined strings:

>>> unique = [''.join(x)  for x in set(list(zip(StringA, StringB)))]
>>> unique
['TG', 'GT', 'KT', 'TK', 'KK']

Alternatively, if you simply want them in a set you can remove the list comprehension:

>>> unique = set(zip(StringA, StringB))
>>> unique
{('T', 'K'), ('T', 'G'), ('K', 'K'), ('K', 'T'), ('G', 'T')}
miradulo
  • 28,857
  • 6
  • 80
  • 93
0

You can use use zip function and set to create the expected list :

>>> z=set(zip(a,b))
>>> z
set([('T', 'G'), ('K', 'T'), ('T', 'K'), ('G', 'T'), ('K', 'K')])

then use chain and combinations functions from itertools module for create the combinations :

>>> a=['K', 'T', 'T', 'T', 'K', 'K', 'G', 'T', 'T', 'T', 'T', 'K', 'K']
>>> b=['T', 'K', 'G', 'G', 'K', 'T', 'T', 'K', 'G', 'G', 'K', 'K', 'T']
>>> from itertools import combinations,chain
>>> z=[''.join(k) for k in set(zip(a,b))]
>>> z
['TG', 'KT', 'TK', 'GT', 'KK']
>>> list(chain.from_iterable(combinations(z, r) for r in range(len(z)+1)))
[(), ('TG',), ('KT',), ('TK',), ('GT',), ('KK',), ('TG', 'KT'), ('TG', 'TK'), ('TG', 'GT'), ('TG', 'KK'), ('KT', 'TK'), ('KT', 'GT'), ('KT', 'KK'), ('TK', 'GT'), ('TK', 'KK'), ('GT', 'KK'), ('TG', 'KT', 'TK'), ('TG', 'KT', 'GT'), ('TG', 'KT', 'KK'), ('TG', 'TK', 'GT'), ('TG', 'TK', 'KK'), ('TG', 'GT', 'KK'), ('KT', 'TK', 'GT'), ('KT', 'TK', 'KK'), ('KT', 'GT', 'KK'), ('TK', 'GT', 'KK'), ('TG', 'KT', 'TK', 'GT'), ('TG', 'KT', 'TK', 'KK'), ('TG', 'KT', 'GT', 'KK'), ('TG', 'TK', 'GT', 'KK'), ('KT', 'TK', 'GT', 'KK'), ('TG', 'KT', 'TK', 'GT', 'KK')]
Mazdak
  • 105,000
  • 18
  • 159
  • 188
0
snp1 = ['K', 'T', 'T', 'T', 'K', 'K', 'G', 'T', 'T', 'T', 'T', 'K', 'K']
snp2 = ['T', 'K', 'G', 'G', 'K', 'T', 'T', 'K', 'G', 'G', 'K', 'K', 'T']
combinations = []
for a,b in zip(snp1, snp2):
    combinations.append(a+b)         

print list(set(combinations))

output:

['KK', 'TG', 'GT', 'TK', 'KT']

Or a simple one liner would do:

list(set([a+b for a,b in zip(snp1, snp2)]))

output:

['KK', 'TG', 'GT', 'TK', 'KT']
Ashoka Lella
  • 6,631
  • 1
  • 30
  • 39