1

I have a string of characters and a list of characters. I wish to create a dictionary in which the keys are the characters as and the values are the list, only without the key character.

A string of characters:

sequence = 'ATGCG'

The list:

bases = ['C', 'T', 'A', 'G']

The resulting dictionary would be:

{'A': ['C', 'T', 'G'],
 'T': ['C', 'A', 'G'],
 'G': ['C', 'T', 'A'],
 'C': ['T', 'A', 'G'],
 'G': ['C', 'T', 'A'],
}

I tried using the following code but got a list of 4 items:

variations = {current_base: [base for base in bases if current_base != base]
              for current_base in sequence}

I'd love to get ideas regarding what I'm doing wrong. Thanks.

mozway
  • 194,879
  • 13
  • 39
  • 75
Ziv
  • 109
  • 10
  • Does this answer your question? [All combinations of a list of lists](https://stackoverflow.com/questions/798854/all-combinations-of-a-list-of-lists) – wovano Nov 09 '21 at 12:42
  • [works for me](https://pythontutor.com/visualize.html#code=bases%20%3D%20%5B'C',%20'T',%20'A',%20'G'%5D%0Asequence%20%3D%20'ATGCG'%0Avariations%20%3D%20%7Bcurrent_base%3A%20%5Bbase%20for%20base%20in%20bases%20if%20current_base%20!%3D%20base%5D%20for%20current_base%20in%20sequence%7D&cumulative=false&heapPrimitives=nevernest&mode=edit&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false) the dict has 4 elements but each list has 3 – Tadhg McDonald-Jensen Nov 09 '21 at 12:43
  • 2
    Dictionaries do not support duplicate keys. – Gedas Miksenas Nov 09 '21 at 12:44

2 Answers2

3

What you want to do is impossible, a dictionary cannot have duplicated keys.

{'A': ['C', 'T', 'G'],
 'T': ['C', 'A', 'G'],
 'G': ['C', 'T', 'A'],
 'C': ['T', 'A', 'G'],
 'G': ['C', 'T', 'A'], ## this is impossible
}

You can use a list of tuples instead. I am taking the opportunity to show you a more efficient method using python sets:

sequence = 'ATGCG'
bases = set(list('ACGT'))
[(b,list(bases.difference(b))) for b in sequence]

NB. actually, it is even more efficient to pre-compute the diffs as you have a potentially very long DNA sequence, but only 4 bases:

sequence = 'ATGCG'
bases = set(list('ACGT'))
diffs = {b: list(bases.difference(b)) for b in bases}
[(b,diffs[b]) for b in sequence]

output:

[('A', ['T', 'C', 'G']),
 ('T', ['A', 'C', 'G']),
 ('G', ['T', 'A', 'C']),
 ('C', ['T', 'A', 'G']),
 ('G', ['T', 'A', 'C'])]
alternative output using the position as key:
{i: list(bases.difference(b)) for i,b in enumerate(sequence)}

output:

{0: ['T', 'C', 'G'],
 1: ['A', 'C', 'G'],
 2: ['T', 'A', 'C'],
 3: ['T', 'A', 'G'],
 4: ['T', 'A', 'C']}
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Try this:

sequence = 'ATGCG'
d = {c: list(''.join(set(list(sequence))).replace(c,'')) for c in set(list(sequence))}
Thicham
  • 1
  • 2