How can I subset the dataframe below to show only the rows were columns AAA:CCC
all have the same value and retain the IndID
field?
Dat <- data.frame(IndID = LETTERS[seq(1,10)],
AAA = c(1,5,3,2,3,1,5,4,6,2),
BBB = c(1,8,5,2,5,4,8,4,4,5),
CCC = c(1,5,3,2,3,5,5,4,6,5))
> Dat
IndID AAA BBB CCC
1 A 1 1 1
2 B 5 8 5
3 C 3 5 3
4 D 2 2 2
5 E 3 5 3
6 F 1 4 5
7 G 5 8 5
8 H 4 4 4
9 I 6 4 6
10 J 2 5 5
I would like to return the following result.
Result <- data.frame(IndID = c("A", "D", "H"),
AAA = c(1,2,4),
BBB = c(1,2,4),
CCC = c(1,2,4))
> Result
IndID AAA BBB CCC
1 A 1 1 1
2 D 2 2 2
3 H 4 4 4
I have found a number of related posts including Find duplicated rows (based on 2 columns) in Data Frame in R and Find duplicated elements with dplyr, among others, but have not been able to reproduce the desired result with three columns. For example, while close the code below shows all distinct
rows, but with the undesired result of including rows were only two of the values are equal.
Dat %>% distinct(AAA, BBB, CCC)
I suspect the solution involves filter
but an not sure how to obtain the desired result from the example mentioned above. A dplyr
solution is preferred.
Addition
I also wonder if this could be applied to multiple factors. For example with the data below, which contains an IndID
and three dates stored as factors, can a similar result be produced by either allowing factors or changing the factors to numeric values?
Dat <- structure(list(GenIndID = structure(c(1L, 2L, 6L, 7L, 3L, 4L,
8L, 5L), .Label = c("BHS_601", "BHS_603", "BHS_604", "BHS_605",
"BHS_631", "BHS_635", "BHS_636", "BHS_637"), class = "factor"),
CptrDate = structure(c(1L, 2L, 3L, 3L, 2L, 2L, 3L, 4L), .Label = c("2016-02-01",
"2016-02-02", "2016-02-04", "2016-12-11"), class = "factor"),
DtLastAlive = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 3L), .Label = c("2016-02-04",
"2017-07-13", "2017-08-27"), class = "factor"), DtFnlFate = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 1L, 3L), .Label = c("2016-02-04", "2017-07-13",
"2017-08-27"), class = "factor")), .Names = c("GenIndID",
"CptrDate", "DtLastAlive", "DtFnlFate"), row.names = c(82L, 83L,
224L, 225L, 84L, 85L, 226L, 360L), class = "data.frame")
> Dat
GenIndID CptrDate DtLastAlive DtFnlFate
82 BHS_601 2016-02-01 2017-07-13 2017-07-13
83 BHS_603 2016-02-02 2017-07-13 2017-07-13
224 BHS_635 2016-02-04 2016-02-04 2016-02-04
225 BHS_636 2016-02-04 2016-02-04 2016-02-04
84 BHS_604 2016-02-02 2017-07-13 2017-07-13
85 BHS_605 2016-02-02 2017-07-13 2017-07-13
226 BHS_637 2016-02-04 2016-02-04 2016-02-04
360 BHS_631 2016-12-11 2017-08-27 2017-08-27
with the desired result being
> Dat[c(3, 4, 7),]
GenIndID CptrDate DtLastAlive DtFnlFate
224 BHS_635 2016-02-04 2016-02-04 2016-02-04
225 BHS_636 2016-02-04 2016-02-04 2016-02-04
226 BHS_637 2016-02-04 2016-02-04 2016-02-04