1

Is there a way to generalise this (https://stackoverflow.com/a/33127773/22295881) answer to the problem of splitting a string column into multiple columns with data.table?

It would be great to have a solution that can work for any user-provided column rather than the column named "type".

I tried to loop on column names e.g.:

dtToSplit = data.table(attr = c(1,30,4,6),
                       typeA=c('foo_and_bar','foo_and_bar_2'),
                       typeB=c('cat_and_dog', 'orange_and_apple'))
namesSpl <- c('typeA', 'typeB')
for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(.(indN), "_and_")]
}

Instead of splitting the strings I get:

   attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
1:    1   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
2:   30 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB
3:    4   foo_and_bar      cat_and_dog  typeA  typeA  typeB  typeB
4:    6 foo_and_bar_2 orange_and_apple  typeA  typeA  typeB  typeB

Maybe a loop is not the best idea?

1 Answers1

0

Use get(indN) instead of .(indN):

for (indN in namesSpl) {
  dtToSplit[, paste0(indN, 1:2) := tstrsplit(get(indN), "_and_")]
}
dtToSplit
#     attr         typeA            typeB typeA1 typeA2 typeB1 typeB2
#    <num>        <char>           <char> <char> <char> <char> <char>
# 1:     1   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 2:    30 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple
# 3:     4   foo_and_bar      cat_and_dog    foo    bar    cat    dog
# 4:     6 foo_and_bar_2 orange_and_apple    foo  bar_2 orange  apple
r2evans
  • 141,215
  • 6
  • 77
  • 149