So I am trying to loop over a data.frame in R where I have proteins and all of the protein subregions. The identifying factor is the geneID. The first occurrence of the geneID is always the whole protein. The following occurrences are the subregions. I am trying to align the subregions with the whole protein to determine the start and stop locations and then add that back to the DF. The data looks like this:
https://i.stack.imgur.com/tGPok.jpg
The code I am working on looks like this, problem is it is stuck on the first iteration. Not sure what I am doing wrong:
for(i in 1:length(keyplayers$geneid)) {
id <- keyplayers$geneid[[i]]
a <- i + 1
while(keyplayers$geneid[[a]] == keyplayers$geneid[[i]]) {
pat <- matchPattern(keyplayers$sequence[[a]] , keyplayers$sequence[[i]])
keyplayers$start[a] <- start(pat)
keyplayers$end[a] <- end(pat)
}
}