I have the following data frame (actual data has a larger number of columns):
df <- data.frame(
l1=c(ind1='000000',ind2='100100'),
l2=c(ind1='200204',ind2='124124'),
l3=c(ind1='400204',ind2='124124'))
In R I would like to split each column into two of length 3. Column names don't matter as long as the original order is conserved. My desired output therefore is:
ind1 000 000 200 204 400 204
ind2 100 100 124 124 124 124
I did find some pointers as to how this could work so I made a function based on one of the answers found in this SO post.
splitGT <- function(x) {
return(strsplit(x, "(?<=.{3})", perl=TRUE)[[1]])
}
While this does the splitting correctly, the result when applying it to the dataframe is an array separated by the original columns:
apply(df, c(1,2), splitGT)
, , l1
ind1 ind2
[1,] "000" "100"
[2,] "000" "100"
, , l2
ind1 ind2
[1,] "200" "124"
[2,] "204" "124"
, , l3
ind1 ind2
[1,] "400" "124"
[2,] "204" "124"
I managed to get past this with adply but this produced a data frame with two lines per ind and one column per original column. While this is closer to what I need I feel like I am missing something very obvious as this appears way too complicated to me.
adply(apply(df, c(1,2), splitGT), c(1, 2))
X1 X2 l1 l2 l3
1 1 ind1 000 200 400
2 2 ind1 000 204 204
3 1 ind2 100 124 124
4 2 ind2 100 124 124