how to compute by row of data.table

Question

I am now using the package data.table,however,I can barely find the way to compute by rows. for example :

apply(x,1,sum) # suppose x is a data.frame with many columns

Is there anyone who knows how to do this?

Welcome to SO! In order for someone to assist you please tag the question with the language and tools you are using. — codemonkeh, Apr 21 '15 at 01:21
Furthermore: read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). This will make it much easier for others to help you. — Jaap, Apr 21 '15 at 10:52

score 2 · Answer 1 · answered Apr 21 '15 at 01:38

2

Do your best to avoid by-row operations, but if you must:

dt[, your.by.row.operation, by = 1:nrow(dt)]

answered Apr 21 '15 at 01:38

eddi

49,088
6
104
155

x=data.table(m=1:5,n=1:5) x[,sum,by=1:nrow(x)] # error x[,sum(.SD),by=1:nrow(x)] # error so,how did you do this? – 冷雨夜客 Apr 21 '15 at 06:07
`x[, sum(unlist(.SD)), by = 1:nrow(x)]`, though for `sum` in particular, there are much better ways – eddi Apr 21 '15 at 06:15
This is perfect. The reason I do it in this way is because I think this will be faster for data.table. Is there any other faster way? – 冷雨夜客 Apr 21 '15 at 06:20
1

for `sum` in particular use `x[, rowSums(.SD)]` or `x[, Reduce('+', .SD)]`; which one will be faster will depend on data – eddi Apr 21 '15 at 06:29
I just did a little test. `x[, Reduce('+', .SD)]` is the fastest. – 冷雨夜客 Apr 21 '15 at 06:32
`seq_len(nrow(dt))` will be safer to use than `1:nrow(dt)` – jangorecki Apr 21 '15 at 10:39
@JanGorecki it actually looks like for the only case where there is a difference, `data.table` has a bug. The following has an output: `dt = data.table(a = integer(0)); dt[, print('boo'), by = seq_len(nrow(dt))]`, but I don't think it should. At least `1:nrow(dt)` just throws up, which is more acceptable imo. – eddi Apr 21 '15 at 14:01
@eddi print of constant value - not from dt - is not a good example as it makes a side effect, so it's result cannot be *reasonably* used by group. As oppose example I can provide two queries where using `seq_len` returns exactly what we expect, without throwing error as in case `1:nrow`: `dt[, print(a), by = seq_len(nrow(dt))]` or `dt[, sum(a), by = seq_len(nrow(dt))]`. For me `seq_len` looks safer. – jangorecki Apr 21 '15 at 15:42
1

@JanGorecki I'm simply pointing out that the `j-value` gets computed at all, which I don't think it should – eddi Apr 22 '15 at 02:06

score 0 · Answer 2 · answered Apr 21 '15 at 04:10

You could try using transform(). For example, using a dummy dataset that I have:

> head(data)
   sample     time.min       abs time.sec
1:  pur n 0.0008333334 0.4678054     0.05
2:  pur n 0.2508333325 0.4661632    15.05
3:  pur n 0.5008333325 0.4663149    30.05
4:  pur n 0.7508333325 0.4658490    45.05
5:  pur n 1.0008333920 0.4671631    60.05
6:  pur n 1.2508333920 0.4657932    75.05

Let's say I want to sum the two "time" columns together, and fill a new column with that value. I could use transform() to do that:

> transform(data, time.sum = time.min + time.sec)
        sample     time.min       abs time.sec     time.sum
  1:     pur n 0.0008333334 0.4678054     0.05   0.05083333
  2:     pur n 0.2508333325 0.4661632    15.05  15.30083328
  3:     pur n 0.5008333325 0.4663149    30.05  30.55083328
  4:     pur n 0.7508333325 0.4658490    45.05  45.80083328
  5:     pur n 1.0008333920 0.4671631    60.05  61.05083691

suppose I have one thousand columns,how can I do this? – 冷雨夜客 Apr 21 '15 at 06:12 — 冷雨夜客, Apr 21 '15 at 06:12

how to compute by row of data.table

2 Answers2