How to delete rows in one column that do not match the second column?

Question

I have the following problem and I don't know where to start in R: I have two columns with the same information, but one column contains some additional information. I want both columns to be exactly the same. Here is an example:

Thus, some numbers in the second column must be deleted, so that both columns have equal length and have in each row the same number. I guess there is a possibility to construct a loop and tell R to delete the number in the second column until column1 = column2. But I don't know where to start with. Is there even a possibility that R reads automatically the two columns and deletes if the two rows don't match?

Could you write a bit more about your problem? What are these numbers? Two separate vectors? How do you create them? Can't you just copy the first one (first column in your post) and merge the original and copy into an array? I don't really get the gist of your question. — toniedzwiedz, May 24 '12 at 13:04

score 3 · Accepted Answer · edited May 23 '17 at 11:51

3

Taking your question at face value, this will return only the rows where column 1 == column 2 and the rows with NA are also removed. If this isn't what you expect as an output, please clarify your question further, preferable with a reproducible example.

> dat <- read.table(text = "1   1
+ 1   1
+ 2   1
+ 2   2
+ 3   2
+ 3   2
+ 4   2
+ 4   3
+ 5   3
+ 5   3
+ NA    4
+ NA    4
+ NA    4
+ NA    5
+ NA    5
+ NA    5
+ NA    5
+ NA    5", header = FALSE)


> dat[dat$V1 == dat$V2 & complete.cases(dat),]
  V1 V2
1  1  1
2  1  1
4  2  2

edited May 23 '17 at 11:51

Community

1
1

answered May 24 '12 at 14:31

Chase

67,710
18
144
161

@David - it's a *relatively* new argument added in R2.14.xx I believe. – Chase May 24 '12 at 14:51

score 0 · Answer 2 · answered May 24 '12 at 14:33

First, lets make some R objects that illustrate your problem:

a <- c(1,1,2,2,3,3,4,4,5,5)
b <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,5)

From the question, it sounds like you have them in the same object:

c <- cbind(a,b)
  Warning message:
In cbind(a, b) :
  number of rows of result is not a multiple of vector length (arg 1)

But this actually adds the first length(b) - length(a) elements of a to the end so that it is as long as b.

you could just fill in the missing values of a first:

 a2 <- append(a, rep(NA, 6)

now you can bind them together:

 c <- cbind(a2, b)

but now it sounds like you want to remove elements from b that do not match a. You propose a for loop. But that gets messy, and will quickly demonstrate that the task at hand is poorly defined. while might be more appropriate, but again, it quickly becomes apparent, as mentioned in the comment by @user1407656 that you could get the desired result by just binding the two columns of a together:

 d <- cbind(a,a)

How to delete rows in one column that do not match the second column?

2 Answers2