For this problem I have a separate txt file called DNA.txt which contains this:
>HSGLTH1 Human theta 1-globin gene
CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATATAGTGAACACCTAAGA
CGGGGGGCCTTGGATCCAGGGCGATTCAGAGGGCCCCGGTCGGAGCTGTCGGAGATTGAGCGCGCGCGGTCCCGG
GATCTCCGACGAGGCCCTGGACCCCCGGGCGGCGAAGCTGCGGCGCGGCGCCCCCTGGAGGCCGCGGGACCCCTG
GCCGGTCCGCGCAGGCGCAGCGGGGTCGCAGGGCGCGGCGGGTTCCAGCGCGGGGATGGCGCTGTCCGCGGAGGA
CCGGGCGCTGGTGCGCGCCCTGTGGAAGAAGCTGGGCAGCAACGTCGGCGTCTACACGACAGAGGCCCTGGAAAG
GTGCGGCAGGCTGGGCGCCCCCGCCCCCAGGGGCCCTCCCTCCCCAAGCCCCCCGGACGCGCCTCACCCACGTTC
CTCTCGCAGGACCTTCCTGGCTTTCCCCGCCACGAAGACCTACTTCTCCCACCTGGACCTGAGCCCCGGCTCCTC
ACAAGTCAGAGCCCACGGCCAGAAGGTGGCGGACGCGCTGAGCCTCGCCGTGGAGCGCCTGGACGACCTACCCCA
CGCGCTGTCCGCGCTGAGCCACCTGCACGCGTGCCAGCTGCGAGTGGACCCGGCCAGCTTCCAGGTGAGCGGCTG
CCGTGCTGGGCCCCTGTCCCCGGGAGGGCCCCGGCGGGGTGGGTGCGGGGGGCGTGCGGGGCGGGTGCAGGCGAG
TGAGCCTTGAGCGCTCGCCGCAGCTCCTGGGCCACTGCCTGCTGGTAACCCTCGCCCGGCACTACCCCGGAGACT
TCAGCCCCGCGCTGCAGGCGTCGCTGGACAAGTTCCTGAGCCACGTTATCTCGGCGCTGGTTTCCGAGTACCGCT
GAACTGTGGGTGGGTGGCCGCGGGATCCCCAGGCGACCTTCCCCGTGTTTGAGTAAAGCCTCTCCCAGGAGCAGC
CTTCTTGCCGTGCTCTCTCGAGGTCAGGACGCGAGAGGAAGGCGC
For this problem, I want to take this file name and return a FASTA data structure in a separate file which I believe is the first line of the dna.txt file and then the dna sequence is a string of letters returned ignoring any white space. I also want to close the file at the end once I'm done using it.
Expected Output:
[’>HSGLTH1 Human theta 1-globin gene’, ’←-
CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGT
5 AGAGACTAAATACCATATAGTGAACACCTAAGACGGGGGGC
6 CTTGGATCCAGGGCGATTCAGAGGGCCCCGGTCGGAGCTGT
7 CGGAGATTGAGCGCGCGCGGTCCCGGGATCTCCGACGAGGC
8 CCTGGACCCCCGGGCGGCGAAGCTGCGGCGCGGCGCCCCCT
9 GGAGGCCGCGGGACCCCTGGCCGGTCCGCGCAGGCGCAGCG
10 GGGTCGCAGGGCGCGGCGGGTTCCAGCGCGGGGATGGCGCT
11 GTCCGCGGAGGACCGGGCGCTGGTGCGCGCCCTGTGGAAGA
12 AGCTGGGCAGCAACGTCGGCGTCTACACGACAGAGGCCCTG
13 GAAAGGTGCGGCAGGCTGGGCGCCCCCGCCCCCAGGGGCCC
14 TCCCTCCCCAAGCCCCCCGGACGCGCCTCACCCACGTTCCTC
15 TCGCAGGACCTTCCTGGCTTTCCCCGCCACGAAGACCTACTT
16 CTCCCACCTGGACCTGAGCCCCGGCTCCTCACAAGTCAGAGC
17 CCACGGCCAGAAGGTGGCGGACGCGCTGAGCCTCGCCGTGG
18 AGCGCCTGGACGACCTACCCCACGCGCTGTCCGCGCTGAGC
19 CACCTGCACGCGTGCCAGCTGCGAGTGGACCCGGCCAGCTT
20 CCAGGTGAGCGGCTGCCGTGCTGGGCCCCTGTCCCCGGGAG
21 GGCCCCGGCGGGGTGGGTGCGGGGGGCGTGCGGGGCGGGT
22 GCAGGCGAGTGAGCCTTGAGCGCTCGCCGCAGCTCCTGGGC
23 CACTGCCTGCTGGTAACCCTCGCCCGGCACTACCCCGGAGAC
24 TTCAGCCCCGCGCTGCAGGCGTCGCTGGACAAGTTCCTGAGC
25 CACGTTATCTCGGCGCTGGTTTCCGAGTACCGCTGAACTGTG
26 GGTGGGTGGCCGCGGGATCCCCAGGCGACCTTCCCCGTGTTTG
27 AGTAAAGCCTCTCCCAGGAGCAGCCTTCTTGCCGTGCTCTCTC
28 GAGGTCAGGACGCGAGAGGAAGGCGC’]
This is what I have so far:
def get_DNA(name):
with open(DNA.txt,'r') as dna:
DNA_d = {}
for line in dna:
if line ????????
return DNA_d
To clarify my problem, I am not sure how to have the "HSGLTH1 Human theta 1-globin gene" be a "header" and separate from the remainder of the string from the DNA.txt file. I want it to be returned within my get_DNA(name) function that is in a separate file.
Thank you for your time and kindness!