I am trying make a dataframe in R which includes fasta headers and sequences. I used the code below to do this however now I would like to make columns in my df using information from the fasta headers.
Here is the content of the header that I would like to use to make columns in my df. Ideally each piece of information between brackets ([]) would be a column. The main thing I need is the location as a column.
lcl|FR839628.1_cds_CCA36173.1_1 [locus_tag=PP7435_CHR1-0001] [db_xref=EnsemblGenomes-Gn:PP7435_Chr1-0001,EnsemblGenomes-Tr:CCA36173,UniProtKB/TrEMBL:F2QL95] [protein=Hypothetical_protein] [protein_id=CCA36173.1] [location=5023..6504] [gbkey=CDS]
Thanks for your help!
I tried this and it worked for making a df but now I want to make columns from the df$seq_name
library("Biostrings")
fastaFile <- readDNAStringSet("my.fasta")
seq_name = names(fastaFile)
sequence = paste(fastaFile)
df <- data.frame(seq_name, sequence)
I tried to use this string split command but I am not sure how to do it in a way that saves the outputs into columns of the df.
string = df$seq_name
strsplit(string,split='[', fixed=TRUE)