I have this table (data1) with four columns
SNP rs6576700 rs17054099 rs7730126
sample1 G-G T-T G-G
I need to separate columns 2-4 into two columns each, so the new output have 7 columns. Like this :
SNP rs6576700 rs6576700 rs17054099 rs17054099 rs7730126 rs7730126
sample1 G G T T C C
With the following function I could split all columns at the time but the output is not what I need.
split <- function(x){
x <- as.character(x)
strsplit(as.character(x), split="-")
}
data2=apply(data1[,-1], 2, split)
data2
$rs17054099
$rs17054099[[1]]
[1] "T" "T"
$rs7730126
$rs7730126[[1]]
[1] "G" "G"
$rs6576700
$rs6576700[[1]]
[1] "C" "C"
In Stack Overflow I found a method to convert the output of strsplit to a dataframe but the rs numbers are in rows not in columns (I got a similar output with other methods in this thread strsplit by row and distribute results by column in data.frame)
> n <- max(sapply(data2, length))
> l <- lapply(data2, function(X) c(X, rep(NA, n - length(X))))
> data.frame(t(do.call(cbind, l)))
t.do.call.cbind..l..
rs17054099 T, T
rs7730126 G, G
rs2061700 C, C
If I do not use the function transpose (...(t(do.call...), the output is a list that I cannot write to a file.
I would like to have the solution in R to make it part of a pipeline.
I forgot to say that I need to apply this to a million columns.