I am using R to construct and analyze a data set created from a Python script that a colleague has created which returns the following structure where 13 refers to the number of samples and 3128 is the number of observations of traits that are coded as a single digit(every single digit after the sample name represents a single column, the value encapsulating the coding for the trait):
13 3128
>1062_0 0000000000[...]
>1066A_0 000001010[...]
>1067A_0 000002010[...]
>1067B_0 110013010[...]
>1067C_0 000024010[...]
>1067D_0 000024010[...]
>1084A_0 200100010[...]
>1084B_0 001005110[...]
>1084C_0 000000010[...]
>1086_0 0100002100[...]
>1087_0 3002040100[...]
>1088_0 0000060111[...]
>C105_0 0000050120[...]
I am working to get these get these data into a data frame which has 13 rows and 3,128 columns.
I have used the read.phylip function of phylotools to read in this file above and can get it into a data.frame:
SL_FFR_input <- read.phylip(fil = "matrix.phy")
SL_FFR_frame <- phy2dat(SL_FFR_input)
However, this results in a data frame of two columns, V1 being the sample names, and V2 being a string of all of the single digit codings.
The frame that would be useful is shown below, where the sample names form the row names and each value now has its own column.
>1062_0 0 0 0 0 0 0 0 0 0[...]
>1066A_0 0 0 0 0 0 1 0 1 0[...]
>1067A_0 0 0 0 0 0 2 0 1 0[...]
>1067B_0 1 1 0 0 1 3 0 1 0[...]
>1067C_0 0 0 0 0 2 4 0 1 0[...]
>1067D_0 0 0 0 0 2 4 0 1 0[...]
>1084A_0 2 0 0 1 0 0 0 1 0[...]
>1084B_0 0 0 1 0 0 5 1 1 0[...]
>1084C_0 0 0 0 0 0 0 0 1 0[...]
>1086_0 0 1 0 0 0 0 2 1 0[...]
>1087_0 3 0 0 2 0 4 0 1 0[...]
>1088_0 0 0 0 0 0 6 0 1 1[...]
>C105_0 0 0 0 0 0 5 0 1 2[...]
It would be a huge help if someone could point me in the right direction!