I am trying to merge two data frames, one with the first 30 nucleotides (or characters) of a sequence, repeated once per nucleotide (so 30 repeats per sequence). Here is a subset of that data frame:
The second data frame has each full ORF sequence once, with associated Prot. Molecules per cell scores for each sequence. I want to match each 30nt sequence (and all its repeats) from the first data frame with the Prot. Molecules per cell counts from the second data frame. Here is a subset of the second data frame:
My general thoughts were to find a way to replace each sequence in the second data frame with only the first 30 nucleotides in that sequence and then use the merge() function. However, I am afraid I don't know how to slice the sequences, and I am also worried that the merge() function in R will remove the repeats of each 30 nucleotide sequence in the first data frame.
Would greatly appreciate any help!