I am trying to look at protein sequence homology using R, and I'd like to go through a data frame looking for identical pairs of Position and Letter. The data look similar to the frame below:
Letter <- c("A", "B", "C", "D", "D", "E", "G", "L")
Position <- c(1, 2, 3, 4, 4, 5, 6, 7)
data.set <- cbind(Position, Letter)
Which yields:
Position Letter
[1,] "1" "A"
[2,] "2" "B"
[3,] "3" "C"
[4,] "4" "D"
[5,] "4" "D"
[6,] "5" "E"
[7,] "6" "G"
[8,] "7" "L"
I'd like to loop through and find all identical observations (in this case, observations 4 and 5), but I'm having difficulty in discovering the best way to do it.
I'd like the resultant data frame to look like:
Position Letter
[1,] "4" "D"
[2,] "4" "D"
The ways I've tried to do this ended up yielding this code, but unfortunately it returns one value of TRUE because I realized that I am comparing two identical data frames:
> identical(data.set[1:nrow(data.set),1:2], data.set[1:nrow(data.set),1:2])
[1] TRUE
I'm not sure if looping through using the identical() function would be the best way? I'm sure there's a more elegant solution that I am missing.
Thanks for any help!