I have two strings:
StringA: ['K', 'T', 'T', 'T', 'K', 'K', 'G', 'T', 'T', 'T', 'T', 'K', 'K']
StringB: ['T', 'K', 'G', 'G', 'K', 'T', 'T', 'K', 'G', 'G', 'K', 'K', 'T']
And I want to test for how many unique combinations of letters there are. The strings are ordered, so I only want to match StringA
position 1 with StringB
position 1, StringA
position 2 with StringB
position 2, etc. So the pairs in the strings above are (KT), (TK), (TG), (TG), (KK), (KT), (GT), (TK), (TG), (TG), (TK), (KK), (KT)
.
And there are 5 unique combinations: (KT), (TK), (TG), (GT), (KK)
I have used the following code to produce the strings from two .csv files.
import sys
import csv
pairlist = open(sys.argv[1], 'r')
snp_file = open(sys.argv[2], 'r')
pair = csv.reader(pairlist, delimiter=',')
snps = csv.reader(snp_file, delimiter=',')
output = open(sys.argv[1]+"_FGT_Result", 'w')
snp1 = []
snp2 = []
firstpair = pair.next()
locusa = firstpair[0]
locusb = firstpair[1]
f = snps
#search = snp.readlines()
for i, row in enumerate(f):
if locusa in row:
hita = row
#print hita
snp1.append(hita[2])
if locusb in row:
hitb = row
#print hitb
snp2.append(hitb[2])
print snp1
print snp2
pairlist.close()
snp_file.close()
output.close()
But I cannot figure out how to do the comparison. I have tried to convert the strings to sets, as I read in another thread, that this is required, but I am not sure why, and I cannot get it to work.