I'm still a bit new to R and have never worked with FASTA files in R before so here goes... I'm trying to subset a list containing FASTA sequences from uniprot (list of 79052). The function names(myFASTA.file) returns character strings like: "sp|Q9Y3X0|CCDC9_HUMAN"
My biggest hurdle is that I have a data frame of differentially expressed proteins that I would like to partially pattern-match names based on the uniprot accession number (ex: "Q9Y3X0") and subset the myFASTA.file list based on matches from a df with a column holding uniprot IDs (df$uniprot_ID). It feels like there should be a way through grepl or lapply but I'm unsure the syntax that would make this the most efficient
Please let me know if there is any clarification needed. Any guidance would be incredibly helpful.
Thank you in advance!