Partial pattern matching to subset FASTA list

Question

I'm still a bit new to R and have never worked with FASTA files in R before so here goes... I'm trying to subset a list containing FASTA sequences from uniprot (list of 79052). The function names(myFASTA.file) returns character strings like: "sp|Q9Y3X0|CCDC9_HUMAN"

My biggest hurdle is that I have a data frame of differentially expressed proteins that I would like to partially pattern-match names based on the uniprot accession number (ex: "Q9Y3X0") and subset the myFASTA.file list based on matches from a df with a column holding uniprot IDs (df$uniprot_ID). It feels like there should be a way through grepl or lapply but I'm unsure the syntax that would make this the most efficient

Please let me know if there is any clarification needed. Any guidance would be incredibly helpful.

Thank you in advance!

Without a [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) we can't really offer you any advice/solutions. Please see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Also, for this type of domain-specific question I believe you will get more help at https://bioinformatics.stackexchange.com/ — jared_mamrot, Apr 20 '22 at 04:27
Please provide enough code so others can better understand or reproduce the problem. — Community, Apr 20 '22 at 08:12

Partial pattern matching to subset FASTA list

0 Answers0