I've created the following MWE
test_data <- data.frame(ID = c(1,2,2,3,4,4,4,5),
Date_From = as.Date(c("2022-10-17", "2022-10-17", "2022-10-17", "2022-09-12", "2022-09-12",
"2022-09-12","2022-09-02","2022-09-02")),
Date_To = as.Date(c("2022-10-24", "2022-10-24", "2022-10-24", "2022-09-17", "2022-09-17",
"2022-09-17", "2022-09-07", "2022-09-07")),
Category = c("Cars", "Books", "Books", "Fruits", "Fish", "Fish", "Fish", "Fish"),
Value = c(10:3))
which outputs
ID Date_From Date_To Category Value
1 1 2022-10-17 2022-10-24 Cars 10
2 2 2022-10-17 2022-10-24 Books 9
3 2 2022-10-17 2022-10-24 Books 8
4 3 2022-09-12 2022-09-17 Fruits 7
5 4 2022-09-12 2022-09-17 Fish 6
6 4 2022-09-12 2022-09-17 Fish 5
7 4 2022-09-02 2022-09-07 Fish 4
8 5 2022-09-02 2022-09-07 Fish 3
I want to create a subset of that data frame based on the following condition:
If the ID, Date_From, Date_To and Category
column is the same for multiple rows, display only the first one of those and also display all other rows.
In this particular example,
- the rows 2-3 have the same values inside those columns and I want to display only the 2nd row.
- the rows 5-6 have the same values inside those columns and I want to display only the 5th row.
- also I want to display "all other rows" with non-matching columns => row 1,4,7,8
The final data frame should look like this
ID Date_From Date_To Category Value
1 1 2022-10-17 2022-10-24 Cars 10
2 2 2022-10-17 2022-10-24 Books 9
3 3 2022-09-12 2022-09-17 Fruits 7
4 4 2022-09-12 2022-09-17 Fish 6
5 4 2022-09-02 2022-09-07 Fish 4
6 5 2022-09-02 2022-09-07 Fish 3