I need to remove redundant records from a file, but these redundant records don't look like standard duplicates. The object have
is a data frame with the number of school projects the characters of the TV show Recess have worked on together. There are 7,000 observations.
head(have)
obs authA authB n_projects
1 TJ.DETWEILER GRETCHEN.WILSON 11
2 TJ.DETWEILER KING.BOB 2
3 TJ.DETWEILER ASHLEY.SPINELLI 1
4 TJ.DETWEILER VINCE.LASALLE 3
5 GRETCHEN.WILSON TJ.DETWEILER 11
6 GRETCHEN.WILSON ASHLEY.SPINELLI 7
… … … …
There is one redundant record shown: the 1st observation contains the same information as the 5th observation. The author order (i.e., who is listed as authA
or authB
) doesn't matter. I need to remove one of these observations - it doesn't matter which. The new data frame want
could look like this:
obs authA authB n_projects
1 TJ.DETWEILER GRETCHEN.WILSON 11
2 TJ.DETWEILER KING.BOB 2
3 TJ.DETWEILER ASHLEY.SPINELLI 1
4 TJ.DETWEILER VINCE.LASALLE 3
6 GRETCHEN.WILSON ASHLEY.SPINELLI 7
… … … …
though removing the first obs would also be fine.