1

I am working on my first real project within R and ran into a problem. I am trying to compare 2 columns within 2 different data.frames. I tried running the code,

matrix1 = matrix
for (i in 1:2000){
  if(data.QW[i,1] == data.RS[i,1]){
    matrix1[i,1]== "True"
  }
  else{
    matrix1[i,1]== "False"
  }
}

I got this error:

Error in Ops.factor(data.QW[i,1], data.RS[i,1]) : 
  level sets of factors are different

I think this may be because QW and RS have different row lengths. But I am trying to see where these errors might be within the different data.frames and fix them according to the source document.

I am also unsure if matrix will work for this or if I need to make it into a vector and rbind it into the matrix every time.

Any good readings on this would also be appreciated.

Jaap
  • 81,064
  • 34
  • 182
  • 193
user2920249
  • 137
  • 4
  • 13
  • you could look into `merge` - you can set `all = TRUE` to merge data.frames and fill in the missing values with NA, or maybe `match` would be more suited. Also, what sort of data is in this matrix? These solutions depend on it being some sort of unique values. – blep Sep 21 '15 at 23:29
  • That would work except I am unsure of where the missing values are. One row could be missing on one data.frame in one place and another could be missing within the other row, etc. – user2920249 Sep 21 '15 at 23:34
  • There is only one row being compared (row 1) so I'm not what you mean. – Pierre L Sep 21 '15 at 23:34
  • Oh, I missed the error - it's actually a different issue, not the dimensions, see: http://stackoverflow.com/questions/24594981/getting-the-error-level-sets-of-factors-are-different-when-running-a-for-loop – blep Sep 21 '15 at 23:35
  • 2
    The title says you are comparing two columns. But that is not what your loop is doing. It is comparing row 1 and columns 1 through 2000. – Pierre L Sep 21 '15 at 23:36
  • @pierreLafortune Ah, I see had those mixed up. I fixed the original code. I am still getting the same error however. – user2920249 Sep 21 '15 at 23:42
  • In the first line, is `matrix` an object you created or is it just the basic matrix function? – Pierre L Sep 21 '15 at 23:45
  • @dd3 I saw that and it's very similar to my problem but (I may be reading this wrong) that appeared to say that they had different types like a string or an int. I am pretty sure mine are both the same. Is there a way to check this within R? – user2920249 Sep 21 '15 at 23:48
  • @PierreLafortune it is the basic matrix function – user2920249 Sep 21 '15 at 23:48
  • try `levels(data.QW[,1])` and `levels(data.RS[,1])` - you may want to read about factors in R. – blep Sep 21 '15 at 23:53
  • 1
    My crystal ball is broken. Can you post `dput(head(data.QW))` and the same for `data.RS` – Pierre L Sep 21 '15 at 23:55
  • Is that the matrix or the dataframe? – MAPK Sep 22 '15 at 01:29

3 Answers3

0

As mentioned in the comments, providing a reproducible example with the contents of the dataframe will be helpful.

Going by how the question topic sounds, it appears that you want to compare column 1 of data frame A against column 1 of data frame B and store the result in a logical vector. If that summary is accurate, please take a look here.

Community
  • 1
  • 1
Chaos
  • 466
  • 1
  • 5
  • 12
0

Too long for a comment.

Some observations:

  1. Your columns, data.QW[,1] and data.RS[,1] are almost certainly factors.
  2. The factors almost certainly have different set of levels (it's possible that one of the factors has a subset of the levels in the other factor). When this happens, comparisons using == will not work.
  3. If you read your data into these data.frames using something like read.csv(...) any columns containing character data were converted to factors by default. You can change that behavior by setting stringsAsFactors=FALSE in the call to read.csv(...). This is a very common problem.
  4. Once you've sorted out the factors/levels problem, you can avoid the loop by using, simply: data.QW[1:2000,1]==data.RW[1:2000,1]. This will create a vector of length 2000 containing all the comparisons. No loop needed. Of course this assumes that both data.frames have at least 2000 rows.

Here's an example of item 2:

x <- as.factor(rep(LETTERS[1:5],3))   # has levels: A, B, C, D, E
y <- as.factor(rep(LETTERS[1:3],5))   # has levels: A, B, C
y==x
# Error in Ops.factor(y, x) : level sets of factors are different
jlhoward
  • 58,004
  • 7
  • 97
  • 140
0

The below function compare compares data.frames or matrices a,b to find row matches of a in b. It returns the first row position in b which matches (after some internal sorting required to speed thinks up). Rows in a which have no match in b will have a return value of 0. Should handle numeric, character and factor column types and mixtures thereof (the latter for data.frames only). Check the example below the function definition.

compare<-function(a,b){

    #################################################
    if(dim(a)[2]!=dim(b)[2]){
        stop("\n Matrices a and b have different number of columns!")
    }
    if(!all(sapply(a, class)==sapply(b, class))){
        stop("\n Matrices a and b have incomparable column data types!")    
    }
    #################################################
    if(is.data.frame(a)){
        i <- sapply(a, is.factor)
        a[i] <- lapply(a[i], as.character)
    }
    if(is.data.frame(b)){
        i <- sapply(b, is.factor)
        b[i] <- lapply(b[i], as.character)
    }
    len1<-dim(a)[1]
    len2<-dim(b)[1]
    ord1<-do.call(order,as.data.frame(a))
    a<-a[ord1,]
    ord2<-do.call(order,as.data.frame(b))
    b<-b[ord2,]     
    #################################################
    found<-rep(0,len1)  
    dims<-dim(a)[2]
    do_dims<-c(1:dim(a)[2]) 
    at<-1
    for(i in 1:len1){
        for(m in do_dims){
            while(b[at,m]<a[i,m]){
                at<-(at+1)      
                if(at>len2){break}              
            }
            if(at>len2){break}
            if(b[at,m]>a[i,m]){break}
            if(m==dims){found[i]<-at}
        }
        if(at>len2){break}
    }
    #################################################
    found<-found[order(ord1)]
    found<-ord2[found]
    return(found)

}
# example data sets:
ncols<-10
nrows<-1E4
a <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- matrix(sample(LETTERS,size = (ncols*nrows), replace = T), ncol = ncols, nrow = nrows)
b <- rbind(a,b) # example of b containing a
b <- b[sample(dim(b)[1],dim(b)[1],replace = F),] 
found<-compare(a,b)

a<-as.data.frame(a) # = conversion to factors
b<-as.data.frame(b) # = conversion to factors
found<-compare(a,b)
martin
  • 785
  • 5
  • 17