0

I have a datatable that has a column which has a comma separated string of integers, of different lengths. I want each element into a separate column. I tried the stringr::str_extract and base strsplit; both do not work in this case. They vectorize vertically instead of horizontally. How do I vectorize horizontally, or for each element of the comma separated string?

Here is a replicable example.

Consider a simple datatable with 2 columns

set.seed(10)
r<-vector(mode = "character",length = 10)
for(i in 1:5) r[i]=paste(sample(10,4),collapse = ",")
set.seed(10)
for(i in 6:10) r[i]=paste(sample(10,2),collapse = ",")
dt1=data.table(sn=1:10,col=r)
dt1
    sn     col
 1:  1 6,3,4,5
 2:  2 1,3,9,2
 3:  3 7,4,6,9
 4:  4 2,6,3,4
 5:  5 1,3,4,6
 6:  6     6,3
 7:  7     5,7
 8:  8     1,3
 9:  9    3,10
10: 10     7,4

I need a transformed datatable that has the first row as

 1: sn=1, col="6,3,4,5", col1=6, col2=3,col3=4,col4=5

and 10th row as

10: sn=10,col="7,4",col1=7,col2=4,col3=NA,col4=NA

I tried,

data.table(sn=dt1$sn,col=dt1$col,col1=strsplit(dt1$col,","))

or,

dt1[,.(sn,col,col1=str_split(col,pattern = ","))]

or,

dt1[,.(sn,col,col1=str_extract_all(col,pattern = "\\d+")[1],col2=str_extract_all(col,pattern = "\\d+")[2])]

But all these are vectorized and I get comma separated strings in each row and each column.

My question is how do I separate the comma separated strings without using a for loop and store them as different columns in the data.table? I will be happy even if you melt the data.table into multiple rows, one each for each element, storing the values in the same column name col but on different rows.

I could not get any relevant question on SO or at least the search doesn't bring up results matching with my question.

PS. An extra upvote for anyone who can guide me to create the sample data.table without using 5 lines of code that I did. It looks atrociously long & it uses a for loop :-)

Lazarus Thurston
  • 1,197
  • 15
  • 33

0 Answers0