1

Hopefully this is an easy one. I just can't seem to piece together an answer. I have a data frame. For each row, I have values that I need to change to NA. It is not the same value that needs to be changed for every row. I want to change values to NA for each row based on a value that is in a specified column.

    mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC")))

    > mydata
      V1 V2 V3 V4 V5
    1 AA CC BB DC CC
    2 CC CC BB DC BB
    3 BB BB BB DC DC

    #for each row, replace values that match the value in column 5 with NA
    apply(mydata[,1:4], 1, function(x){
    x[x %in% x$V5]  = NA
    })

Desired output

    > mydata
      V1 V2 V3 V4 V5
    1 AA NA BB DC CC
    2 CC CC NA DC BB
    3 BB BB BB NA DC

Thanks!

----UPDATE----

Using the code below from arvi1000 works great for comparing values in a row to a single column of values. Is there a way to do something like this but comparing the values to 2 or more columns?

Current code

    mydata[,1:4][mydata[,1:4]==mydata[,5]] <- NA

Let's say I also have a column 6. By row, I want to change values that do not equal values in columns 5 or 6 to NA.

    mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC","AA"),c("CC","CC","BB","DC","BB","CC"),c("BB","BB","BB","DC","DC","BB")),stringsAsFactors=F)

    > mydata
      V1 V2 V3 V4 V5 V6
    1 AA CC BB DC CC AA
    2 CC CC BB DC BB CC
    3 BB BB BB DC DC BB

Desired output

    > mydata
      V1 V2 V3 V4 V5 V6
    1 AA CC NA NA CC AA
    2 CC CC BB NA BB CC
    3 BB BB BB DC DC BB

I tried to do this, but received an error

 mydata[,1:4][mydata[,1:4]==mydata[,5]|mydata[,6]] <- NA
    Error in mydata[, 1:4] == mydata[, 5] | mydata[, 6] : 
      operations are possible only for numeric, logical or complex types
SC2
  • 313
  • 8
  • 21

2 Answers2

1

Add stringsAsFactors=F to as.data.frame. This is key because 'CC'!='CC' when they are different levels of different factors.

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC")),
                       stringsAsFactors=F)

Then:

mydata[,1:4][mydata[,1:4]==mydata[,5]] <- NA

Voila:

  V1   V2   V3   V4 V5
1 AA <NA>   BB   DC CC
2 CC   CC <NA>   DC BB
3 BB   BB   BB <NA> DC
arvi1000
  • 9,393
  • 2
  • 42
  • 52
  • Hi, this works great! Is there a way I could do this for comparing the data to values in 2 or more columns? I tried using a conditional (see my above edit) but that didn't work out so well. Thanks! – SC2 Nov 05 '14 at 14:29
  • You were close! `mydata[,1:4]==mydata[,5] | mydata[,1:4]==mydata[,6]` will do it – arvi1000 Nov 05 '14 at 15:29
  • That's great! Doesn't seem to work the same if I want to do != instead of == though. If I want to do != do I need to put together a different statement altogether? – SC2 Nov 05 '14 at 15:56
  • Troubleshooting logical operators is probably a different question. The basic idea though, is that you create a dataframe of true/false values of the same shape as mydata[,1:4] and use that to index which 'cells' you want to make NA. Look at the logical indexing data.frame by itself and try to get that sorted as a first step (i.e. look at mydata[,1:4]!=mydata[,5] and build from there; you probably want `&` not `|` to combine != statements) – arvi1000 Nov 05 '14 at 16:04
1

Another way would be using apply:

mydata = as.data.frame(rbind(c("AA","CC","BB","DC","CC"),c("CC","CC","BB","DC","BB"),c("BB","BB","BB","DC","DC")))

mydata <- data.frame(t(apply(mydata,1,function(x) {
  for ( i in 1:(ncol(mydata)-1)){
    if ( x[i] == x[ncol(mydata)]) {
      x[i] <- NA
    }  
  }
  return(x)
})))

output:

> mydata
  V1   V2   V3   V4 V5
1 AA <NA>   BB   DC CC
2 CC   CC <NA>   DC BB
3 BB   BB   BB <NA> DC
LyzandeR
  • 37,047
  • 12
  • 77
  • 87