If I have two files with DNA sequences and everything is in order (IDs so finding the correct sequence is easy), how do I just merge the two lines into 1 consensus? My example below (not using DNA sequences so it's easier to read)
Note: all ids are identical in the same order, and the length of the sequences are the same. For example if I have file A with:
>id1
THISISA-----
>id2
HELLO-------
>id3
TESTTESTTEST
And a second file B with:
>id1
-------TEST!
>id2
-----WORLD!!
>id3
TESTTESTTEST
My ideal output is simply (in a new file C):
>id1
THISISATEST!
>id2
HELLOWORLD!!
>id3
TESTTESTTEST
I am terrible with strings in python, and so far I've just managed to open each file with readlines and save the content. Essentially, gaps are identified with "-" and if there is a character in either file that can replace the hyphen, I want it to do that.
Just tips on how to start is appreciated, I don't have code to provide other than:
import os
import sys
file1 = sys.argv[1]
file2 = sys.argv[2]
file1_seqs = []
file1_ids = []
with open(file1, "r") as f1:
content1 = f1.readlines()
for i in range(len(content1)):
if i % 2 == 1: # get the DNA sequence
msa1_seqs.append(content1[i])
else:
msa1_ids.append(content1[i])
Repeated the above code to open the second file (file2) and kept the text in lists msa2_seqs and msa2_ids. Now I am just stuck in trying to call the write elements at the same time so I can create another loop to change "-" into characters if any other character exists.