Duplicate function produces a large factor

Question

I have a data frame with one column like this:

col1
line1
line1
line2

I try to remove duplicate using this:

df2 <- df[!duplicated(df), ]

but it produces a large factor instead of removing the duplicate. The result of structure something like is this:

str(df2)
 Factor w/ 7472 levels

Welcome to Stack Overflow! We ask that for questions involving troubleshooting code that you provide a reproducible example. You can use `dput()` to share the data. — Hack-R, Oct 15 '16 at 13:20

h3rm4n · Answer 1 · 2016-10-15T13:36:00.017

2

When you have just one column, you need to use drop = FALSE to get a dataframe back:

df2 <- df[!duplicated(df), , drop = FALSE]

another option is using the unique function:

df2 <- unique(df)

the result of both approaches is the same:

> df2
   col1
1 line1
3 line2

edited Oct 15 '16 at 13:36

answered Oct 15 '16 at 13:19

h3rm4n

You don't need drop except when there's only 1 column (and if there's only 1 column why would you want a data.frame?). – Hack-R Oct 15 '16 at 13:25
`drop = FALSE` is indeed only needed when you have one column in your dataframe (which is the case as OP described) – h3rm4n Oct 15 '16 at 13:28

Hack-R · Answer 2 · 2016-10-15T13:25:23.080

0

col1 <- c("line1",
          "line1",
          "line2")

df <- data.frame(col1=col1, x=c(1,2,3))

df1 <- df[!duplicated(df$col1),]
df1

   col1 x
1 line1 1
3 line2 3

class(df1)

[1] "data.frame"

edited Oct 15 '16 at 13:25

answered Oct 15 '16 at 13:21

Hack-R

This doesn't return a dataframe, which is what OP wants to my understanding – h3rm4n Oct 15 '16 at 13:24
@h3rm4n It returns a data.frame when there's more than 1 column. If there's 1 column you shouldn't use a data.frame. – Hack-R Oct 15 '16 at 13:27
yes, but OP only has one column – h3rm4n Oct 15 '16 at 13:28
you changed the example data, which isn't correct imo – h3rm4n Oct 15 '16 at 13:29

2 Answers2