Conditionally removing duplicates based on several conditions in R

Question

Right now I have the following table hypothetically:

.1 .2 .3
.2 1 C
.2 1 C
.2 1 C
.3 1 N
.3 1 N
.4 1 N
.4 1 N
.4 1 N
.4 1 N

We only want to keep the rows that contain a C for column 3 given that column 2 is the same AND the values for column 1 are different. This should result in the following table:

.1 .2 .3
.2 1 C
.2 1 C
.2 1 C

I already looked at the following questions:
Remove duplicates based on 2nd column condition
R, conditionally remove duplicate rows
Conditionally removing duplicates in R

Do you have any idea how to achieve this?

Column 1 is the one with values 2,3 and 4. The first line are the "headers" — S Agzennay, Feb 06 '20 at 15:13

score 1 · Answer 1 · answered Feb 06 '20 at 15:25

I am not sure I quite understand what you need to do but here is an attempt with a simple If statement that checks the variance of the two columns, i.e.

if (var(dd3$X1) != 0 & var(dd3$X2) == 0) { dd3 <- subset(dd3, X3 == 'C')}
dd3
#  X1 X2 X3
#1  2  1  C
#2  2  1  C
#3  2  1  C

where,

dput(dd3)
structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", "N"), class = "factor")), class = "data.frame", row.names = c(NA, -9L))

score 0 · Answer 2 · answered Feb 06 '20 at 15:33

0

I do not entirely understand what you mean with "given that column 2 is the same". You could use subset

subset(df, df$col3 == "C" & df$col1 != df$col2)

where i used col1, col2 and col3 for the headers

answered Feb 06 '20 at 15:33

maarvd

1,254
1
4
14

score 0 · Answer 3 · answered Feb 06 '20 at 15:43

Maybe you can try the following base R code with ave, i.e.,

dfout <- subset(df,as.logical(ave(X3,X1,X2,FUN = function(v) v=="C")))

# > dfout
#    X1 X2 X3
# 1  2  1  C
# 2  2  1  C
# 3  2  1  C

DATA

df <- structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = c("C", "C", "C", "N", "N", 
"N", "N", "N", "N")), row.names = c(NA, -9L), class = "data.frame")

Conditionally removing duplicates based on several conditions in R

3 Answers3