I have been given a shotgun genome sequence, which can be found here:
https://www.ncbi.nlm.nih.gov/nuccore/NZ_LRPF01000001
This sequence is made of 205,000 letters. Some of them are CDS (coding sequences) but most are non-coding and therefore not important.
For example the first coding region is entries 343 to 780 and then the second one is 937 to 1866, this obviously means that there are non-coding regions from 1 to 342 and then from 781 to 936 etc.
I am asked to perform some analysis on this sequence, and I would like to have 1 fasta file made of the coding sequence and another made of non-coding sequence.
I know how to cut this file into two vectors manually in R but there are 187 coding regions which I will need to manually locate and correctly cut. Is there some r function or algorithm that will detect the coding and non-coding regions and group them separately?
Perhaps there is a way to do it on the ncbi website?
EDIT: Could someone at least explain why am I getting downvoted?