0

I want to partition a dataframe so that elements unique in a certain column are separated from the non-unique elements. So the dataframe below will be separated to two dataframes like so

    id   v1   v2
1    1    2    3
2    1    1    1 
3    2    1    1
4    3    1    2
5    4    5    6
6    4    3    1

to

    id   v1   v2
1    2    1    1
2    3    1    2

and

    id   v1   v2
1    1    2    3
2    1    1    1 
3    4    5    6
4    4    3    1

where they are split on the uniqueness of the id column. duplicated doesn't work in this situation because lines 1 and 5 in the top dataframe are not considered to be duplicates i.e. the first occurrence returns FALSE in duplicated.

EDIT

I went with

dups <- df[duplicated(df1$id) | duplicated(df$id, fromLast=TRUE), ]
uniq <- df[!duplicated(df1$id) & !duplicated(df$id, fromLast=TRUE), ]

which ran very quickly with my 250,000 row dataframe.

mattdevlin
  • 1,045
  • 2
  • 10
  • 17
  • 2
    This _is_ a duplicate question. Use `duplicated( ) & duplicated(, fromLast=TRUE)`. Or just look at `methods(unique)` and see that there is a `unique.dataframe` or you could work on the vector itself to build a logical index with `df$id %in% unique(df$id)` – IRTFM Dec 10 '14 at 18:43
  • `dat[with(dat, ave(id, id, FUN = length)) == 1, ]; dat[with(dat, ave(id, id, FUN = length)) > 2, ]` – rawr Dec 10 '14 at 18:55

1 Answers1

1

I think the easiest way to approach this problem is with data.table and see where you have more than 1 count by id

Your data

data <- read.table(header=T,text="
    id   v1   v2
    1    2    3
    1    1    1 
    2    1    1
    3    1    2
    4    5    6
    4    3    1
")

Code to spilt data

library(data.table)
setDT(data)
data[, Count := .N, by=id]

Unique table by id

data[Count==1]

   id v1 v2 Count
1:  2  1  1     1
2:  3  1  2     1

Non-unique table by id

data[Count>1]

   id v1 v2 Count
1:  1  2  3     2
2:  1  1  1     2
3:  4  5  6     2
4:  4  3  1     2
Mike.Gahan
  • 4,565
  • 23
  • 39