15

I have a data frame that I'm working with in which I'd like to compare a data point Genotype with two references S288C and SK1. This comparison will be done across many rows (100+) of the data frame. Here are the first few lines of my data frame:

    Assay   Genotype S288C SK1
1   CCT6-002     G     A    G
2   CCT6-007     G     A    G
3   CCT6-013     C     T    C
4   CCT6-015     G     A    G
5   CCT6-016     G     G    T

As a final product, I'd like a character string of 1's (S288C) and 0's (SK1) depending on which of the references the data point matches. Thus in the example above I'd like an output of 00001 since all except the last match SK1.

dpel
  • 1,954
  • 1
  • 21
  • 31
Sam Globus
  • 585
  • 2
  • 5
  • 17
  • What are you comparing, and how is the comparison to be made? Give us some sample output that goes along with your sample input. –  Oct 04 '11 at 19:40
  • I am comparing Genotype to each of the next two columns S288C and SK1. When Genotype is identical to S288C id like to output a "1" and when its identical to SK1, id like to output a "0". If it matches neither "NA". Id like this comparison to happen at each row. Does this help? – Sam Globus Oct 04 '11 at 19:47
  • The 7,000th R question?! – Jason B Oct 04 '11 at 20:09

1 Answers1

20

A nested ifelse should do it (take a look at help(ifelse) for usage):

ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))

With this test data:

> dat
     Genotype S288C SK1
[1,] "G"      "A"   "G"
[2,] "G"      "A"   "G"
[3,] "C"      "T"   "C"
[4,] "G"      "A"   "G"
[5,] "G"      "G"   "T"
[6,] "G"      "A"   "A"

We get:

> ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))
[1]  0  0  0  0  1 NA

(Note: If you have trouble using this, you'll want to make sure that the columns are vectors, and are not treated by R as factors...a simple for loop should do it: for (i in 1:ncol(dat)){dat[,i]=as.vector(dat[,i])}).

  • This worked perfectly. Thank you! If I wanted to add the results as a new column to the data frame (ie after the SK1 column) how would I do so? Sorry if this is basic but im very new to R and programming in general. – Sam Globus Oct 04 '11 at 21:59
  • 1
    The `cbind` function should work: `newdat<-as.data.frame(cbind(dat,ifelse(dat$Genotype==dat$S288C,1,ifelse(dat$Genotype==dat$SK1,0,NA))))`. –  Oct 04 '11 at 22:03
  • when i run the above cbind function, it gives me back what looks like a 2 part data frame. The first part is the original data frame, the second is the calls made using the above language with the corresponding line number. Is there a way to get these all to align within the same data frame? – Sam Globus Oct 06 '11 at 13:59