3

I am trying to generate a list of all possible DNA sequences of length four with the four character A, T, C, G. There is a total of 4^4 (256) different combinations. I include repeats, such that AAAA is allowed. I have looked at itertools.combinations_with_replacement(iterable, r) however, the list output changes depending on the order of the input string i.e

itertools.combinations_with_replacement('ATCG', 4) #diff results to...
itertools.combinations_with_replacement('ATGC', 4)

Because of this, I had an attempt at combining itertools.combinations_with_replacement(iterable, r), with itertools.permutations()

Such that pass the output of itertools.permutations() to itertools.combinations_with_replacement(). As defined below:

def allCombinations(s, strings):
perms = list(itertools.permutations(s, 4))
allCombos = []
for perm in perms:
    combo = list(itertools.combinations_with_replacement(perm, 4))
    allCombos.append(combo)
for combos in allCombos:
    for tup in combos:
        strings.append("".join(str(x) for x in tup))

However running allCombinations('ATCG', li) where li = [] and then taking the list(set(li)) still only proceeds 136 unique sequences, rather than 256.

There must be an easy way to do this, maybe generating a power set and then filtering for length 4?

izaak_pyzaak
  • 930
  • 8
  • 23

2 Answers2

6

You can achieve this by using product. It gives the Cartesian product of the passed iterables:

a = 'ACTG'

print(len(list(itertools.product(a, a, a, a))))
# or even better, print(len(list(itertools.product(a, repeat=4)))) as @ayhan commented
>> 256

But it returns tuples, so if you are looking for strings:

for output in itertools.product(a, repeat=4):
    print(''.join(output))

>> 'AAAA'
   'AAAC'
   .
   .
   'GGGG'
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
0

You could just try this

l = []

s = 'ATCG'

for a in s:
    n1 = a
    for b in s:
        n2 = n1 + b
        for c in s:
            n3 = n2 + c
            for d in s:
                l.append(n3+d)
Pax Vobiscum
  • 2,551
  • 2
  • 21
  • 32
  • This *really* doesn't scale very well -- and isn't a good answer to a question which is looking for an itertools approach. – John Coleman Jul 05 '16 at 11:50
  • You're right, it does not. But sometimes you have to use what is best in the current situation, not just go with the best scalable option. This one is fairly easy to grasp. – Pax Vobiscum Jul 05 '16 at 11:53
  • "This one is fairly easy to grasp" Not quite. In order to understand what this code does one needs to mentally keep in mind and track around 8 different variables. – DeepSpace Jul 05 '16 at 11:57
  • No, I'm not giving this as a "copy+paste"-example, however the concept of 4 nested for loops is something that is not programmatical but mathematical, and the concept of this is basic combinatorics. And then again thinking about now HOW but WHY to answer this question, is it to get a working oneliner? Understand ``itertools``? or maybe it is just an exercise from school... – Pax Vobiscum Jul 05 '16 at 12:06