I have a dataset that looks like this:
text id screenName retweetCount isRetweet retweeted longitude latitude
1 xx 778980737861062656 0504Traveller 0 FALSE FALSE <NA> <NA>
2 xx 778967536167559168 Iz_Azman 0 FALSE FALSE <NA> <NA>
3 yy 778962265298960384 Iz_Azman 0 FALSE FALSE <NA> <NA>
4 yy 778954988122939392 travelindtoday 2 FALSE FALSE <NA> <NA>
5 zz 778948691969224705 umtn 2 FALSE FALSE <NA> <NA>
6 zz 778942095843135493 flyinsider 0 FALSE FALSE <NA> <NA>
These are tweets from the package twittR
in R. Some tweets have exactly the same text
but different retweetCount
. I want to keep the unique tweets (by text
), but keeping those with the highest retweetCount
amongst duplicates. (In the case above, tweets 1, 4, and 5.)
How do I do that?