0

Right now I have the following table hypothetically:

.1 .2 .3
.2 1 C
.2 1 C
.2 1 C
.3 1 N
.3 1 N
.4 1 N
.4 1 N
.4 1 N
.4 1 N

We only want to keep the rows that contain a C for column 3 given that column 2 is the same AND the values for column 1 are different. This should result in the following table:

.1 .2 .3
.2 1 C
.2 1 C
.2 1 C

I already looked at the following questions:
Remove duplicates based on 2nd column condition
R, conditionally remove duplicate rows
Conditionally removing duplicates in R

Do you have any idea how to achieve this?

3 Answers3

1

I am not sure I quite understand what you need to do but here is an attempt with a simple If statement that checks the variance of the two columns, i.e.

if (var(dd3$X1) != 0 & var(dd3$X2) == 0) { dd3 <- subset(dd3, X3 == 'C')}
dd3
#  X1 X2 X3
#1  2  1  C
#2  2  1  C
#3  2  1  C

where,

dput(dd3)
structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = structure(c(1L, 1L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("C", "N"), class = "factor")), class = "data.frame", row.names = c(NA, -9L))
Sotos
  • 51,121
  • 6
  • 32
  • 66
0

I do not entirely understand what you mean with "given that column 2 is the same". You could use subset

subset(df, df$col3 == "C" & df$col1 != df$col2)

where i used col1, col2 and col3 for the headers

maarvd
  • 1,254
  • 1
  • 4
  • 14
0

Maybe you can try the following base R code with ave, i.e.,

dfout <- subset(df,as.logical(ave(X3,X1,X2,FUN = function(v) v=="C")))

# > dfout
#    X1 X2 X3
# 1  2  1  C
# 2  2  1  C
# 3  2  1  C

DATA

df <- structure(list(X1 = c(2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L), X2 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), X3 = c("C", "C", "C", "N", "N", 
"N", "N", "N", "N")), row.names = c(NA, -9L), class = "data.frame")
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81