2

Given a data.table with an arbitrary number of columns

dt = data.table( a = letters[1:5], b = rep('-', 5), c = LETTERS[1:5] )
#    a b c
# 1: a - A
# 2: b - B
# 3: c - C
# 4: d - D
# 5: e - E

and an arbitrary format string having a number of placeholders corresponding to the number of columns

format = '%s0%s1%s'

How to apply sprintf without explicitly calling it with all column names? Just provide the data.table won't work because sprintf expects 3 arguments in this case. Calling sprintf(format, dt$a, dt$b, dt$b) is no option because I don't know neither the format nor the data.table beforehand. Also sapply on row indices won't work because the operation has to be done on a subset of rows preserving their order.

idx = seq( 1, by = 2, to = 5 )

So the goal is by issueing a hypothetical command

dt[ idx, sprintf( format, * )]

achieving this

# [1] "a0-1A" "c0-1C" "e0-1E"

It can be done by calling

cols = paste( names( dt ) ), collapse=',' )
# "a,b,c"
eval( parse( text = sprintf( 'dt[ idx, sprintf( format,%s )]', cols ) ) )
# [1] "a0-1A" "c0-1C" "e0-1E"

But this is rather arcane and definetly not clean code and it would be nice if there was a more semantic way in the data.table package itself. So the question is basically if there is one. Until now neither google nor the the data.table manual gave me an answer. Furthermore I do neither want to paste some/multiple columns but I want to use all columns instead nor I am looking for a way to achieve this by the use of data.frame. I am looking for smooth data.table syntax.

  • See also http://stackoverflow.com/questions/14568662/paste-multiple-columns-together – David Arenburg Jan 22 '17 at 11:15
  • @DavidArenburg edited question now to clearly distuingish between that thread and mine. The questions are quite different and my problem is not adressed in that thread. – Sebastian Gross Jan 22 '17 at 12:15
  • Well, the answers on that tread are using pretty much the same syntax as in the answers in the linked thread. I don't see much difference - if any. But I haven't closed your question, rather just recommended that you will also visit the other thread. – David Arenburg Jan 22 '17 at 12:19
  • Well, [it is](http://stackoverflow.com/a/33015233/3001626). And you just need to read some `data.table` tutorial. Doing `do.call(paste0, .SD)` is pretty much the same as doing `do.call(paste0, dt)`. Same goes for `Reduce` – David Arenburg Jan 22 '17 at 12:35
  • So one answer - yours - matches my needs, but it's kind of confusing to find it there. Especially if it's so low ranked. Therefore I find it useful to have a thread targeting that exact question. – Sebastian Gross Jan 22 '17 at 12:40
  • 3
    Possible duplicate of [Paste multiple columns together](http://stackoverflow.com/questions/14568662/paste-multiple-columns-together) – user20650 Jan 22 '17 at 15:04

2 Answers2

6

We can do this by passing sprintf as argument in do.call

dt[, do.call(sprintf, c(.SD, fmt = format))]
#[1] "a-A" "b-B" "c-C" "d-D" "e-E"

If we need to create this as a column in 'dt', assign (:=) it to create the new column

dt[, newCol := do.call(sprintf, c(.SD, fmt = format))]

We can also use the paste with do.call

dt[, newCol := do.call(paste0, .SD)]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Another option is to use Reduce and paste0:

dt[, abc := Reduce(paste0, .SD)]

which gives:

   a b c abc
1: a - A a-A
2: b - B b-B
3: c - C c-C
4: d - D d-D
5: e - E e-E

On a large dataset this is a tiny bit slower than the do.call/sprintf or do.call/paste0 methods:

akrun1 <- dt[sample.int(5, 1e6, TRUE)]
akrun2 <- copy(akrun)
jaap <- copy(akrun)

> system.time(akrun1[, newCol := do.call(sprintf, c(.SD, fmt = format))])
   user  system elapsed 
  0.280   0.002   0.282 
> system.time(akrun2[, newCol := do.call(paste0, .SD)])
   user  system elapsed 
  0.187   0.001   0.188 
> system.time(jaap[, abc := Reduce(paste0, .SD)])
   user  system elapsed 
  0.325   0.002   0.327
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    TY, but that doesn't fit the needs of using a `format` string. Though the format string I chose - by means of simplicity - matches the `paste0` semantic it was not what I had actually in mind. But it might become handy anyway. – Sebastian Gross Jan 22 '17 at 11:12
  • @SebastianGross I agree that using `sprintf` is more convenint when using a less straightforward format-string. For your next question: It is always good to make the example in your question as close as possible to your real problem. – Jaap Jan 22 '17 at 11:17