I want to partition a dataframe so that elements unique in a certain column are separated from the non-unique elements. So the dataframe below will be separated to two dataframes like so
id v1 v2
1 1 2 3
2 1 1 1
3 2 1 1
4 3 1 2
5 4 5 6
6 4 3 1
to
id v1 v2
1 2 1 1
2 3 1 2
and
id v1 v2
1 1 2 3
2 1 1 1
3 4 5 6
4 4 3 1
where they are split on the uniqueness of the id
column. duplicated
doesn't work in this situation because lines 1 and 5 in the top dataframe are not considered to be duplicates i.e. the first occurrence returns FALSE
in duplicated
.
EDIT
I went with
dups <- df[duplicated(df1$id) | duplicated(df$id, fromLast=TRUE), ]
uniq <- df[!duplicated(df1$id) & !duplicated(df$id, fromLast=TRUE), ]
which ran very quickly with my 250,000 row dataframe.