0

For life of me, I don't know what is wrong here and thought there might be something I am doing wrong:

As with all my other stuff I've asked, this related to genetics

I'm trying to split genetic variables with a string such as A/A, with the following R code:

gt$A1 <- sapply(strsplit(as.character(gt[c(4)]), "/"), function(x) x[1])
gt$A2 <- sapply(strsplit(as.character(gt[c(4)]), "/"), function(x) x[2])

However, what comes out is

A1 = ' c("G '  
A2 = ' G", "G '

for every single variant even if there if the genotypes are not G/G.

Example of my file looks like this:

ID  MGT FGT CGT
001 A/A A/G G/A
002 T/C T/C C/C
003 T/C C/C T/C

Is there a reason why this doesn't split cleanly? I'm assuming maybe the length of the string might be messing this up - but not sure whether this is a problem in R.

user2726449
  • 607
  • 4
  • 11
  • 23
  • As a starting point check the difference here: `as.character(gt[4])` vs. `as.character(gt[,4])` – thelatemail Mar 25 '14 at 00:09
  • Could you also post a `dput(gt)` or `dput(head(gt))`? – hrbrmstr Mar 25 '14 at 00:12
  • This is probably also duplicating this question: http://stackoverflow.com/questions/4350440/using-strsplit-with-data-frames-to-split-label-columns-into-multiple – thelatemail Mar 25 '14 at 00:15
  • as.character(gt[,4]) lists the genotypes in quotations - looks legit. as.character(gt[4]) does not and genotypes are in the form: \"C/C\" (the exact same one in the list was "C/C"). – user2726449 Mar 25 '14 at 00:23
  • 1
    Yep - and your code will work fine if you make that replacement. – thelatemail Mar 25 '14 at 00:28

0 Answers0