I have two vectors of the same lengths initially. This first is full of protein modification sites I.E. "E123". The second is a unique code for the literature reference to this site. I need to go through these vectors to remove multiple references to the same site from the same paper. That is, if VectorOne[1] == VectorOne[2] && VectorTwo[1] == VectorTwo[2], I need to remove the duplicate. The problem is when I use for loops to loop through the data I am potentially changing the lengths of the vectors meaning that the indices I'm using may no longer be correct.
As soon as I have removed a single element from the vectors the value I am looping to length(primarySite) is too high and the code crashes.
Here is an example of the first 10 values from these two vectors:
primarySite[1:10]
[1] "" "" "D248" "E241" "E242" "E241" "E242" "D244" "D244" "E241"
sitePMID[1:10]
[1] 24641686 24055347 23955771 23955771 23955771 23955771 23955771 23955771 23955771 23955771
Desired Output:
primarySite[1:6]
[1] "" "" "D248" "E241" "E242" "D244"
sitePMID[1:6]
[1] 24641686 24055347 23955771 23955771 23955771 23955771
for(i in 1:length(primarySite)){
for(j in (i+1):length(primarySite)){
if(primarySite[i] == primarySite[j] && sitePMID[i] ==
sitePMID[j]){
primarySite <- primarySite[-j]
sitePMID <- sitePMID[-j]
}
}
}