I need to remove duplicate combinations of two columns (feedID and feedID2) within groups (ID), while keeping a large number of other columns in the data set. All the rows with duplicates should be removed, whether it is A in column 2 and B in column 3 or vice versa. Additionally, I would like to remove all rows where there is for example A in both columns, or where there is an NA in one of the columns. I can not sort the data between columns, i.e. if A is in column nr 2, it should remain in column nr 2.
I know this might come across as a duplicate question, but none of the other answers seem to work with my data set, or asks for the same thing. E.g. Finding unique combinations irrespective of position Removing duplicate combinations in R (irrespective of order)
test <- data.frame(ID= c("49V", "49V","49V", "49V", "49V", "52V", "52V", "52V"),
feedID = c("A1", "A1", "G2", "A1", "G2", "B1", "D1", "D2" ),
feedID2 = c("A1", "G2", "A1", "G2", "NA", "D1", "D2", "NA" ))
desiredoutput <- data.frame(ID= c("49V", "52V", "52V"),
feedID = c("A1","B1", "D1" ),
feedID2 = c("G2", "D1", "D2" ))
the following code does not remove duplicates if in different columns
test2 <- test [!duplicated(test[,c("ID","feedID", "feedID2")]),]
this code does not do anything at all but throws no error
test2 <- test%>% distinct(1,2,3) # where numbers refer to the columns
this code produces an error which for dimnames, not sure what that means. I do not get this with my test data, I am not sure why and cannot reproduce the error...
indx <- !duplicated(t(apply(test, 1, sort))) # finds non - duplicates in sorted rows
test[indx, ]
Any ideas?