I have a column with 1000 rows. Each row has 5000 values all separated with semicolon. I like to turn this column into a matrix of 1000 x 5000 dimension. How can I do this in R?
Thanks, Aaron
I have a column with 1000 rows. Each row has 5000 values all separated with semicolon. I like to turn this column into a matrix of 1000 x 5000 dimension. How can I do this in R?
Thanks, Aaron
as.numeric(strsplit(A[,1],";"))
if the data is in a dataframe or array
as.numeric(strsplit(A,";"))
if the data is a vector
This would be one way to do it.
my_data <- data.frame(one_column = c("5;6;5;6;7", "6;5;3;2;5"))
my_data
one
1 5;6;5;6;7
2 6;5;3;2;5
# split each row by a semicolon
my_list <- apply(my_data, 1, strsplit, split = ";")
# unlist and turn it into a matrix
my_matrix <- matrix(as.numeric(unlist(my_list)), nrow = nrow(my_data),
ncol =length(my_list[[1]][[1]]), byrow = TRUE)
my_matrix
[,1] [,2] [,3] [,4] [,5]
[1,] 5 6 5 6 7
[2,] 6 5 3 2 5
You could specify the number of rows and columns if you would like. Here I am assuming that you do indeed have 5000 values for each row.
Let's start creating an example of this dataframe that contains 1 column with semicolon separated values:
d <- data.frame(col1 = paste0(runif(1000), ";", runif(1000), ";", runif(1000)), stringsAsFactors = F)
This is how the dataframe looks like:
head(d)
col1
1 0.20093023753725;0.399978927802294;0.404899704502895
2 0.424904063809663;0.0908958145882934;0.340853224741295
3 0.850118306931108;0.518901573261246;0.364027933450416
4 0.0212442732881755;0.760768163017929;0.146710442844778
5 0.0331723166164011;0.153021499980241;0.449281473178416
6 0.733110485365614;0.0544649658259004;0.663546974770725
The following 2 lines of code produce what you want:
library(stringr)
t <- str_split(d$col1, ";")
d2 <- do.call(rbind, t)
The first line produces a list where each element contains the split values. The second line binds them all together in a matrix.