For example, I have the following dataset (my real dataset has more than 100000 rows and 70 variables):
Country Year Flag
Norway 2018 drop: reason1
Norway 2018 drop: reason2
Sweden 2016 drop: reason3
France 2011 drop: reason2
France 2011 drop: reason3
France 2011 drop: reason4
Firstly, I want to group Flag values by variables Country and Year, so I want to get a table like this:
Country Year Flag
Norway 2018 drop: reason1, drop: reason2
Sweden 2016 drop: reason3
France 2011 drop: reason2, drop: reason3, drop: reason4
Secondly, if there are more than one value in the Flag column, I want to leave only 1 with the following logic:
if the drop: reason1
is present, then leave it and remove the rest. If there is no drop: reason1
, but there is a drop: reason2
and a drop: reason3
, then we leave only the drop: reason2
.
Finally, my dataset should look like this:
Country Year Flag
Norway 2018 drop: reason1
Sweden 2016 drop: reason3
France 2011 drop: reason2
I would like to implement this based on the data.table or base R approach.
I would be very grateful for any help! At least for the first part of the question.