Comparing two columns

Question

I am new to R and I am trouble with a command that I did all the time in Python.

I have two data-frames (database and creditIDs), and what I want to do is compare one column in database and one column in creditIDs. More specifically in a value exists in creditIDs[,1] but doesn't in database[,5], I want to delete that entire row in database. Here is the code:

for (i in 1:lengthColumns){
    if (!(database$credit_id[i] %in% creditosVencidos)){
        database[i,]<-database[-i,]
  }
}

But I keep on getting this error:

50: In `[<-.data.frame`(`*tmp*`, i, , value = structure(list( ... :
replacement element 50 has 9696 rows to replace 1 rows

Could someone explain why this is happening? Thanks!

Please include a reproducible example. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Gabor Csardi, Jul 24 '14 at 19:14
Also please clarify if creditosVencidos is the same as creditIDs (or what it is) and if database[,5] is the same as database$credit_id, which I assume it is — Hack-R, Jul 24 '14 at 19:25

score 0 · Answer 1 · answered Jul 24 '14 at 19:24

the which() command will return the row indices that satisfy a boolean statement, much like numpy.where() in python. Using the $ after a dataframe with a column name gives you a vector of that column... alternatively you could do d[,column_number].

In this example I'm creating an x and y column which share the first five values, and use which() to slice the dataframe on their by-row equality:

L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = rep(1:5, 2), y = 1:10, fac = fac))

d = d[which(d$x == d$y),]
d

  x y fac
  1 1  A
  2 2  B
  3 3  C
  4 4  B
  5 5  B

score 0 · Answer 2 · answered Jul 24 '14 at 19:31

You will need to adjust this for your column names/numbers.

# Create two example data.frames
creditID <- data.frame(ID = c("896-19", "895-8", "899-1", "899-5"))
database <- data.frame(ID = c("896-19", "camel", "899-1", "goat", "899-1"))

# Method 1 
database[database$ID %in% creditID$ID, ] 

# Method 2 (subset() function)
database <- subset(database, ID %in% creditID$ID)

Comparing two columns

2 Answers2