1

I'm currently trying to subset data to a smaller size and I'm having a problem with the coding part, as I'm a complete newbie in coding.

I'm trying to get rid of all rows with identical entries here. So the code should eliminate all rows with identical variables in column 3 "var 2" for example. The duplicate function would just get rid of the second entry with "0", but I'd like to get rid of both entries with "0".

Appreciate your help! https://i.stack.imgur.com/esfSB.jpg

fabiusw
  • 23
  • 5
  • Show us the expected output please. – Pankaj Kaundal Jul 20 '16 at 11:25
  • 2
    Do not post your data as an image, please learn how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Jul 20 '16 at 11:28

3 Answers3

1

You could use the dplyr library to perform data manipulations. Its a neat library and very helpful. I came up with the following code to solve your problem. Assuming that the data frame is stored in a variable called data_frame, the solution is as follows

data_frame <- tbl_df(data_frame) %>%
              group_by(var2) %>%
              filter(n()==1)

I am storing the result in the same variable. You could use another variable name to keep the original data frame intact

Kashyap
  • 6,439
  • 2
  • 22
  • 21
0

Here we use table to see which values are duplicated then search among all values for those that are not duplicated.

df = table(data$Var2)
data[!data$Var2 %in% as.numeric(names(df[df > 1])), ]
catastrophic-failure
  • 3,759
  • 1
  • 24
  • 43
0

We can also include duplicated with fromLast=TRUE to remove all those duplicate rows.

df1[with(df1, !(duplicated(var2)|duplicated(var2, fromLast=TRUE)),]
akrun
  • 874,273
  • 37
  • 540
  • 662