I have these sequences:
0,<|endoftext|>ERRDLLRFKH:GAGCGCCGCGACCTGTTACGATTTAAACAC<|endoftext|>
1,<|endoftext|>RRDLLRFKHG:CGCCGCGACCTGTTACGATTTAAACACGGC<|endoftext|>
2,<|endoftext|>RDLLRFKHGD:CGCGACCTGTTACGATTTAAACACGGCGAC<|endoftext|>
3,<|endoftext|>DLLRFKHGDS:GACCTGTTACGATTTAAACACGGCGACAGT<|endoftext|>
And I'd like to get only the aminoacid sequences, like this:
ERRDLLRFKH:
RRDLLRFKHG:
RDLLRFKHGD:
DLLRFKHGDS:
I have wrote this script so far:
with open("example_val.txt") as f:
for line in f:
if line.startswith(""):
line = line[:-1]
print(line.split(":", 1))
Nevertheless, I got only the original sequences. Please give me some advice.