Apologies if this has been asked but I couldn't find an existing solution.
Suppose a dataframe d
as follows
+-------------------------------+
| date | var1 | var2 |
+-------------------------------+
| 2019/01/01 | 100 | abc |
| 2019/01/01 | 102 | def |
| 2019/01/02 | 99 | ghi |
| 2019/01/02 | 98 | jkl |
| 2019/01/03 | 100 | mno |
| 2019/01/04 | 105 | pqr |
| 2019/01/04 | 98 | stu |
| 2019/01/04 | 110 | vwx |
+-----------------------------+
With associated dput()
d <- structure(list(date = structure(c(17897, 17897, 17898, 17898,
17899, 17900, 17900, 17900), class = "Date"), var1 = c(100, 102,
99, 98, 100, 105, 98, 110), var2 = structure(1:8, .Label = c("abc",
"def", "ghi", "jkl", "mno", "pqr", "stu", "vwx"), class = "factor")),
class = "data.frame", row.names = c(NA, -8L))
I want to remove records from d
based on three requirements:
- Only one record shall remain for each unique date
- The record selected per unique date is max(var1) across all records of the same date in
d
- I want to keep var2 (and any other columns within the real dataset)
Thus, the valid required output would be
+----------------------------------+
| Date | var1 | var2 |
+----------------------------------+
| 01/01/19 | 102 | def |
| 02/01/19 | 99 | ghi |
| 03/01/19 | 100 | mno |
| 04/01/19 | 110 | vwx |
+----------------------------------+
Thank you for any help. Please advise if question could be worded better to make it useful for others.
double standardinconsistency but from what I understood from the discussion this week was that duplicated question should always be marked as duplicate. So the issue/mistake is answering (which I did here without thinking about it). – s_baldur Jan 11 '19 at 13:16