How to delete a duplicate row in R

Question

I have the following data

x y z

1 2 a

1 2

data[2,3] is a factor but nothing shows, In the data, it has a lot rows like this way.How to delete the row when the z has nothing? I mean deleting the rows such as the second row.

output should be

x y z

1 2 a

Can you please post the output of `dput(head(yourdataframe))` to make your sample data easy to copy and paste. — A5C1D2H2I1M1N2O1R2T1, Jun 09 '13 at 06:42
probably `data[ ! data$z=="" , ]` but you didn't post your data as @AnandaMahto asked (pasting the output from `dput(head(data))`) so we dont know. — Simon O'Hanlon, Jun 09 '13 at 07:06

score 5 · Accepted Answer · answered Jun 09 '13 at 08:31

OK. Stabbing a little bit in the dark here.

Imagine the following dataset:

mydf <- data.frame(
  x = c(.11, .11, .33, .33, .11, .11),
  y = c(.22, .22, .44, .44, .22, .44),
  z = c("a", "", "", "f", "b", ""))
mydf
#      x    y z
# 1 0.11 0.22 a
# 2 0.11 0.22  
# 3 0.33 0.44  
# 4 0.33 0.44 f
# 5 0.11 0.22 b
# 6 0.11 0.44

From the combination of your title and your description (neither of which seems to fully describe your problem), I would decode that you want to drop rows 2 and 3, but not row 6. In other words, you want to first check whether the row is duplicated (presumably only the first two columns), and then, if the third column is empty, drop that row. By those instructions, row 5 should remain (column "z" is not blank) and row 6 should remain (the combination of columns 1 and 2 is not a duplicate).

If that's the case, here's one approach:

# Copy the data.frame, "sorting" by column "z"
mydf2 <- mydf[rev(order(mydf$z)), ]
# Subset according to your conditions
mydf2 <- mydf2[duplicated(mydf2[1:2]) & mydf2$z %in% "", ]
mydf2
#      x    y z
# 3 0.33 0.44  
# 2 0.11 0.22

^^ Those are the data that we want to remove. One way to remove them is using setdiff on the rownames of each dataset:

mydf[setdiff(rownames(mydf), rownames(mydf2)), ]
#      x    y z
# 1 0.11 0.22 a
# 4 0.33 0.44 f
# 5 0.11 0.22 b
# 6 0.11 0.44

@Dryad, if this is what you're looking for, do consider up-voting or accepting the answer. Also, welcome to SO, but please note that in order to get better quality answers--there are a lot of folk here more than happy to help out!--please take time to frame a proper question that highlights all dimensions of your problem. Also, be sure to read [how to make a great reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — A5C1D2H2I1M1N2O1R2T1, Jun 09 '13 at 09:00

Paul Hiemstra · Answer 2 · 2013-06-09T08:09:09.787

0

Some example data:

df = data.frame(x = runif(100), 
                y = runif(100),
                z = sample(c(letters[0:10], ""), 100, replace = TRUE))

> head(df)
          x          y z
1 0.7664915 0.86087017 a
2 0.8567483 0.83715022 d
3 0.2819078 0.85004742 f
4 0.8241173 0.43078311 h
5 0.6433988 0.46291916 e
6 0.4103120 0.07511076

Spot row six with the missing value. You can subset using a vector of logical's (TRUE, FALSE):

df[df$z != "",]

And as @AnandaMahto commented, you can even check against multiple conditions:

df[!df$z %in% c("", " "),]

edited Jun 09 '13 at 08:09

answered Jun 09 '13 at 07:25

Paul Hiemstra

59,984
12
142
149

I might upvote if you remove your recommendation to `-which`. Also, I'm still not convinced this covers the possible complexity of the (poorly presented) question. – A5C1D2H2I1M1N2O1R2T1 Jun 09 '13 at 07:37
This is my second time to ask question here, so I don't understand dput(head(data))) – Dryad Jun 09 '13 at 07:51
@AnandaMahto what is your issue with `-which`? It is kind of superfluous, is that it? – Paul Hiemstra Jun 09 '13 at 07:56
It probably isn't a problem here, but try something like `df[-which(df$z == " "), ]` (where a space doesn't exist as a possible value, as in your dataset) compared to `df[!df$z %in% c(" "), ]`. I've used `c()` in the `%in%` example to also show that multiple values can be checked against, unlike when using `==`. – A5C1D2H2I1M1N2O1R2T1 Jun 09 '13 at 08:01
This is my second time to ask question here, so I don't understand dput(head(data))) x y z 1 0.11 0.22 a 2 0.11 0.22 3 0.33 0.44 4 0.33 0.44 f I just want to delete the row 2 when data[1,1]= data[2,1] && data[1,2]=data[2,2] && data[2,3]="" , and delete row 3 for the same condition – Dryad Jun 09 '13 at 08:02
@Dryad, please don't add sample data in comments. EDIT your original question and add it there, properly formatted. – A5C1D2H2I1M1N2O1R2T1 Jun 09 '13 at 08:03

How to delete a duplicate row in R

2 Answers2