I need to remove duplicated rows where all values are duplicates apart from two columns. How do I go about this?
The posted solutions are along the lines of this:
df[!duplicated(df[ , c("x","y")]),]
Where the duplicated rows are selected based on if the value appears again in the stated columns, i.e. x
and y
in the example above. Therefore the suggested answer by the person that closed my question is not helpful (R dataframe: drop duplicates based on certain columns [duplicate]).
Some of my data is duplicated in all 40 columns except two. I would therefore like to use these two columns as the condition (to ignore that is). I can remove the duplicated rows if I don't take into account one column like the example below. Hence, the second suggested answer is also redundant (Select all rows which are duplicates except for one column).
df[!duplicated(df[, -5]), ]
This is by using the column number but I can't get it to work for two columns.
Here is an example data frame where rows (see x5
) 4
, 10
and 16
need to be removed since they are duplicated in each column except x5
and x6
.
df1 <- data.frame(x1 = c("1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3"),
x2 = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C"),
x3 = c("A1", "A1", "A2", "A2", "A3", "A3", "B1", "B1", "B2", "B2", "B3", "B3", "C1", "C1", "C2", "C2", "C3", "C3"),
x4 = c("35", "43", "33", "33", "63", "24", "14", "25", "77", "77", "94", "51", "34", "36", "55", "55", "73", "72"),
x5 = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18"),
x6 = c("XA", "XB", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XK", "XL", "XM", "XN", "XO", "XP", "XQ", "XR"))
This is my desired outcome.
df2 <- data.frame(x1 = c("1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3"),
x2 = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C", "C"),
x3 = c("A1", "A1", "A2", "A3", "A3", "B1", "B1", "B2", "B3", "B3", "C1", "C1", "C2", "C3", "C3"),
x4 = c("35", "43", "33", "63", "24", "14", "25", "77", "94", "51", "34", "36", "55", "73", "72"),
x5 = c("1", "2", "3", "5", "6", "7", "8", "9", "11", "12", "13", "14", "15", "17", "18"),
x6 = c("XA", "XB", "XC", "XE", "XF", "XG", "XH", "XI", "XK", "XL", "XM", "XN", "XO", "XQ", "XR"))