I have a data like this
df <- structure(list(string = structure(c(6L, 12L, 8L, 7L, 2L, 1L,
6L, 12L, 9L, 5L, 11L, 6L, 10L, 3L, 4L, 4L), .Label = c("CGSKDNIKHVPGGGSVQIVYKPVDLSK",
"ESPLQTPTEDGSEEPGSETSDAK", "HVPGGGSVQIVYKPVDLSKVTSK", "KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK",
"QEFEVMEDHAGTYGLGDR", "SKDGTGSDDKK", "SPSSAKSRLQTAPVPMPDLKNVK",
"SRLQTAPVPMPDLK", "SRLQTAPVPMPDLKNVKSK", "SRLQTAPVPMPDLKNVKSKIGSTENLK",
"STPTAEDVTAPLVDEGAPGK", "VQIINKKLDLSNVQSK"), class = "factor"),
key = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L), .Label = c("Mys: G52: ru1", "Mys: G52: ru2",
"Mys: G52: ru3"), class = "factor"), val = structure(c(3L,
15L, 2L, 11L, 9L, 5L, 13L, 6L, 1L, 7L, 8L, 16L, 12L, 4L,
10L, 14L), .Label = c("1442983324", "1451319531", "1512864.443",
"1612410048", "16349475.63", "1784901841", "30553282.01",
"317403612.9", "3612004.547", "3686081.063", "39135868.44",
"43701608", "64223793.8", "64959501.42", "775987137.8", "9767666215"
), class = "factor")), .Names = c("string", "key", "val"), class = "data.frame", row.names = c(NA,
-16L))
I am trying to keep only those that are repeated 2 or more based on the second column.
For example in the above data we only can keep the following
SKDGTGSDDKK is in 3 of them (ru1, ru2 and ru3)
VQIINKKLDLSNVQSK is in 2 of them (ru1, ru2)
the rest of them happened to be only once based on the key
so the output will be
string key val
SKDGTGSDDKK Mys: G52: ru1 1512864.443
SKDGTGSDDKK Mys: G52: ru2 64223793.8
SKDGTGSDDKK Mys: G52: ru3 9767666215
VQIINKKLDLSNVQSK Mys: G52: ru1 775987137.8
VQIINKKLDLSNVQSK Mys: G52: ru2 1784901841