I have this dictionary and a list of sequences in the input file. The dictionary keys represent an aminoacid and the value represents the vector for this aminoacid.
I am trying to get an output like this:
MNTFSQVWVFSDTPSRLPELMNGAQALANQ:000000000010000000000000000000010000000000000000000000001000000010000000000000000000000000000001000000000000000001000000000000000000000001000000000000000000001000000000000000000100000010000000000000000000000000000001000000100000000000000000000000000000000010000000000000001000000000000000000000010000000000000000001000000000000010000000000000000000000010000000000100000000000000000000000010000000000000000000001000000000000000000001000000000000010000000000000010000000000000000000000000000000010000001000000000000000000000000000100000000000100000000000000000000000000000010000000000000000000001000000
NTFSQVWVFSDTPSRLPELMNGAQALANQI:000000000001000000000000000000000000100000001000000000000000000000000000000100000000000000000100000000000000000000000100000000000000000000100000000000000000010000001000000000000000000000000000000100000010000000000000000000000000000000001000000000000000100000000000000000000001000000000000000000100000000000001000000000000000000000001000000000010000000000000000000000001000000000000000000000100000000000000000000100000000000001000000000000001000000000000000000000000000000001000000100000000000000000000000000010000000000010000000000000000000000000000001000000000000000000000100000000000001000000000000
TFSQVWVFSDTPSRLPELMNGAQALANQIN:000000000000000010000000100000000000000000000000000000010000000000000000010000000000000000000000010000000000000000000010000000000000000001000000100000000000000000000000000000010000001000000000000000000000000000000000100000000000000010000000000000000000000100000000000000000010000000000000100000000000000000000000100000000001000000000000000000000000100000000000000000000010000000000000000000010000000000000100000000000000100000000000000000000000000000000100000010000000000000000000000000001000000000001000000000000000000000000000000100000000000000000000010000000000000100000000000000000000000100000000
This is the code that I have so far. I have created a loop for getting all the sequences from the file and after that I am trying to get all the values of the corresponding aminoacid in just one string together with the original sequence.
vecAa = {
"A":"10000000000000000000",
"C":"01000000000000000000",
"D":"00100000000000000000",
"E":"00010000000000000000",
"F":"00001000000000000000",
"G":"00000100000000000000",
"H":"00000010000000000000",
"I":"00000001000000000000",
"L":"00000000100000000000",
"K":"00000000010000000000",
"M":"00000000001000000000",
"N":"00000000000100000000",
"P":"00000000000010000000",
"Q":"00000000000001000000",
"R":"00000000000000100000",
"S":"00000000000000010000",
"T":"00000000000000001000",
"V":"00000000000000000100",
"W":"00000000000000000010",
"Y":"00000000000000000001",
}
with open("/home/example.txt", "r") as f:
for line in f:
x = line
print(x)
out = ([vecAa[value] for value in x ])
However I am getting the following error.
Traceback (most recent call last):
File "vector.py", line 28, in <module>
out = ([vecAa[value] for value in x ])
File "vector.py", line 28, in <listcomp>
out = ([vecAa[value] for value in x ])
KeyError: '\n'
How to resolve this?