I'm trying to write a program that will take a FASTA file and convert it into a 2D list pairing IDs and sequences. To do this, I split the text up into an list of lines and created a 2D list. The 2D list has as many lists in it as there are IDs in the file, and each inner list consists of two empty strings. The program iterates over the list of lines and when it comes to a ID, it concatenates it to the first entry in one of these inner lists. To keep track of which inner list I'm adding to, I initialize the value j to 0, locate the list at index j in the 2D list, and increase the value j by 1. The trouble comes with concatenating. The program somehow keeps track of every previous ID it's encountered and adds them all at once to the next string. j increments correctly, and this code doesn't work: strings_array[j][0] = strings_array[j][0] + lines[i]
, but this code does strings_array[j][0] = lines[i]
, but I can't figure out how strings_array[j][0]
is saving previous values. Thanks to how the file is structured, I need to save previous values for the sequences, so I want to figure out the problem with this line.
Here is the full code:
def dna_processing(filename):
f = open(filename, 'r')
txt = f.read()
f.close()
lines = txt.split("\n")
seq_num = 0
for i in range(len(lines)):
if lines[i] == '':
del lines[i]
elif lines[i][0] == '>':
seq_num = seq_num + 1
strings_list = [["", ""]] * seq_num
j = 0
for i in range(len(lines)):
if lines[i][0] == '>':
strings_list[j][0] = strings_list[j][0] + lines[i]
j = j + 1
#else:
#strings_list[j-1][1] = strings_list[j-1][1] + lines[i]
dna_processing("rosalind_grph.txt")
And here is an example of the text I am inputting:
>Rosalind_5931
AGAATAGGAAGCGCCGTGTTGAAATATAAGAGCACCCCAGACGTGTACTTTGTGTTGGTC
TCTGGCGACCATTCTGTGCGGT
>Rosalind_7410
GAACCTAAGGTCCATCGTCATAACTGCGACCCTACAAACAGATGGTTTCATGTGAAATAA
GTTAGGAACCAGAAAATCATAGCAGACGTA
>Rosalind_0759
GTTTGCATTAGTTCCTCGGGGTCACTCTCCTAGCTATATTGCATAATAACCAGGTGGCTC
CCGTTATGGCCCAAGACACTTGTTGGTAG
>Rosalind_6944
TACGCCGCCATAACAGGGTCCGAGCCGCAAGGTTGGTCCACCGTACTCCAACCATGGCTA
TCAAACGGTTGCAGAGCCACCGAACTGGGCG
>Rosalind_2801
GCTTTCAGGCTAAACCGACATGGTCCCCAATACTTTTAAGATCGGAGTCAAGGTTAAGAG
TGTGGCGTGTTAGCGGCCCTCA