I have a dataframe with multiple variables, and I am interested in how to subset it so that it only includes the first duplicate.
>head(occurrence)
userId occurrence profile.birthday profile.gender postDate count
1 100469891698 6 47 Female 583 days 0
2 100469891698 6 47 Female 55 days 0
3 100469891698 6 47 Female 481 days 0
4 100469891698 6 47 Female 583 days 0
5 100469891698 6 47 Female 583 days 0
6 100469891698 6 47 Female 583 days 0
Here you can see the dataframe. The 'occurrence' column counts how many times the same userId has occurred. I have tried the following code to remove duplicates:
occurrence <- occurrence[!duplicated(occurrence$userId),]
However, this way it remove "random" duplicates. I want to keep the data which is the oldest one by postDate. So for example the first row should look something like this:
userId occurrence profile.birthday profile.gender postDate count
1 100469891698 6 47 Female 583 days 0
Thank you for your help!