I'm working on FASTA files. FASTA files are used in biology to store sequences.
>sequence1 identifier (a string)
sequence on one or several line (a string)
...
>last sequence identifier (a string)
the sequence on one or several line (a string)
I want to make an iterator that returns a struct while reading the file:
struct fasta_seq {
identifier: String,
sequence: String,
}
In Python, it's easy. This code returns a tuple but the idea is the same
def get_seq_one_by_one(file_):
"""Generator return prompt sequence for each sequence"""
sequence = ''
prompt = ''
for line in file_:
if line.startswith('>'):
if sequence:
yield (prompt, sequence)
sequence = ''
prompt = line.strip()[1:]
else:
sequence += line.strip()
yield (prompt, sequence)
This is convenient and allows me to make clearer code because I can iterate through the file with a simple for loop.
for identifier, sequence in get_seq_one_by_one(open_file):
do
I found similar topics:
If I understand correctly, they know the size of the buffer to read. In my case I don't know it because the identifier and/or sequence length may change.
I have checked and using Rust's yield
seems to not be a great idea, because is described as unstable.
I do not want you to code in my place, I am trying to learn by rewriting a script I have done in Python in Rust. I don't know what to use here to answer my problem.
If you can point out the overall idea how to achieve this goal, it would be really nice. If there is no need for unsafe code it will be even better.