I have a datatable that has a column which has a comma separated string of integers, of different lengths. I want each element into a separate column. I tried the stringr::str_extract
and base strsplit
; both do not work in this case. They vectorize vertically instead of horizontally. How do I vectorize horizontally, or for each element of the comma separated string?
Here is a replicable example.
Consider a simple datatable with 2 columns
set.seed(10)
r<-vector(mode = "character",length = 10)
for(i in 1:5) r[i]=paste(sample(10,4),collapse = ",")
set.seed(10)
for(i in 6:10) r[i]=paste(sample(10,2),collapse = ",")
dt1=data.table(sn=1:10,col=r)
dt1
sn col
1: 1 6,3,4,5
2: 2 1,3,9,2
3: 3 7,4,6,9
4: 4 2,6,3,4
5: 5 1,3,4,6
6: 6 6,3
7: 7 5,7
8: 8 1,3
9: 9 3,10
10: 10 7,4
I need a transformed datatable that has the first row as
1: sn=1, col="6,3,4,5", col1=6, col2=3,col3=4,col4=5
and 10th row as
10: sn=10,col="7,4",col1=7,col2=4,col3=NA,col4=NA
I tried,
data.table(sn=dt1$sn,col=dt1$col,col1=strsplit(dt1$col,","))
or,
dt1[,.(sn,col,col1=str_split(col,pattern = ","))]
or,
dt1[,.(sn,col,col1=str_extract_all(col,pattern = "\\d+")[1],col2=str_extract_all(col,pattern = "\\d+")[2])]
But all these are vectorized and I get comma separated strings in each row and each column.
My question is how do I separate the comma separated strings without using a for loop
and store them as different columns in the data.table? I will be happy even if you melt the data.table into multiple rows, one each for each element, storing the values in the same column name col
but on different rows.
I could not get any relevant question on SO or at least the search doesn't bring up results matching with my question.
PS. An extra upvote for anyone who can guide me to create the sample data.table without using 5 lines of code that I did. It looks atrociously long & it uses a for loop :-)