0

I am trying to do a nested ifelse statement in R where, if a row meets two conditions, then delete, if not, keep. However, I am trying to compare across two dataframes, so I couldn't find other previous questions that helped.

I am trying to essentially say this:

ifelse(D1$A == D2$A,
       ifelse(D1$B == D2$B,
              "Delete Row",
              "Keep Row"),
       "Keep Row")

I have a data frame that looks like this:

D1                  
A     B    C       
123   10    Blue     
123   12    Blue     
100   7     Blue

and

D2
A      B    C
123   10    Red
123   12    Red
115   7     Red

To clarify, I want to delete the "Blue" rows that have the same A and B in D2. D2 rows have been classified as Red by other functions, so the rows in D1 that have the same A and B as D2 but are blue are wrong. So when I bind them together, I have rows that are "123,10,Red" and "123,10,Blue" when they should be red.

I want to keep the unique rows in both dataframes, but get rid of the "123,10" in D1 IF it is in D2. The problem is I filtered things out of D1 and put them into D2, and I am trying to rbind them, but there are duplicates. I would delete duplicates when they are bound, but it is not working for some reason.

I tried this:

D2 <- D2[!(D2$A %in% D1$A & D2$B %in% D1$B),]

but I am not getting the right amount of observations. It is deleting 20 more than it should. Thank you!

Reagan
  • 49
  • 1
  • 5
  • You can try something like `which(D1$A == D2$A)`, this return the rows that the conditions are met – cirofdo Jul 16 '18 at 18:26
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Seems like you probably want a `merge()` (or really a `dplyr::semi_join`) and avoid the loop. – MrFlick Jul 16 '18 at 18:27
  • I second @MrFlick The right kind of join should help. Check out [this](http://r4ds.had.co.nz/relational-data.html#filtering-joins) from R4DS by Hadley Wickham. – hello_world Jul 16 '18 at 18:46
  • @hello_world, I read the link you sent and the problem is I want to keep the full row of all of D2 and add in the rows from D1 where in which they meet the criteria. The rows in D1 must stay in tact. – Reagan Jul 16 '18 at 18:51
  • So what's the desired output here? if the values are the same, how can you tell if you are keeping the one from DF1 or DF2? Does it really matter? – MrFlick Jul 16 '18 at 18:55
  • I apologize, I have edited the original question to clarify more. – Reagan Jul 16 '18 at 19:07

2 Answers2

1

I created some data that will hopefully mimic what you are trying to do. If not, hopefully you can change this to be close to what you are going for -

    library(dplyr) 

    helper = data.frame(first = 1:10, second = 5:14)

    helper %>%        
        filter(first<5|second>10)
Dusty
  • 61
  • 5
0

It seems that you want to return a set of TRUE/FALSE. Is this what you are looking for?

D2[D2$A %in% D1$A != D2$B %in% D1$B,]
    A B
3 115 7
akash87
  • 3,876
  • 3
  • 14
  • 30