0

I am now using the package data.table,however,I can barely find the way to compute by rows. for example :

apply(x,1,sum) # suppose x is a data.frame with many columns

Is there anyone who knows how to do this?

eddi
  • 49,088
  • 6
  • 104
  • 155
  • 2
    Welcome to SO! In order for someone to assist you please tag the question with the language and tools you are using. – codemonkeh Apr 21 '15 at 01:21
  • 2
    Furthermore: read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). This will make it much easier for others to help you. – Jaap Apr 21 '15 at 10:52

2 Answers2

2

Do your best to avoid by-row operations, but if you must:

dt[, your.by.row.operation, by = 1:nrow(dt)]
eddi
  • 49,088
  • 6
  • 104
  • 155
  • x=data.table(m=1:5,n=1:5) x[,sum,by=1:nrow(x)] # error x[,sum(.SD),by=1:nrow(x)] # error so,how did you do this? – 冷雨夜客 Apr 21 '15 at 06:07
  • `x[, sum(unlist(.SD)), by = 1:nrow(x)]`, though for `sum` in particular, there are much better ways – eddi Apr 21 '15 at 06:15
  • This is perfect. The reason I do it in this way is because I think this will be faster for data.table. Is there any other faster way? – 冷雨夜客 Apr 21 '15 at 06:20
  • 1
    for `sum` in particular use `x[, rowSums(.SD)]` or `x[, Reduce('+', .SD)]`; which one will be faster will depend on data – eddi Apr 21 '15 at 06:29
  • I just did a little test. `x[, Reduce('+', .SD)]` is the fastest. – 冷雨夜客 Apr 21 '15 at 06:32
  • `seq_len(nrow(dt))` will be safer to use than `1:nrow(dt)` – jangorecki Apr 21 '15 at 10:39
  • @JanGorecki it actually looks like for the only case where there is a difference, `data.table` has a bug. The following has an output: `dt = data.table(a = integer(0)); dt[, print('boo'), by = seq_len(nrow(dt))]`, but I don't think it should. At least `1:nrow(dt)` just throws up, which is more acceptable imo. – eddi Apr 21 '15 at 14:01
  • @eddi print of constant value - not from dt - is not a good example as it makes a side effect, so it's result cannot be *reasonably* used by group. As oppose example I can provide two queries where using `seq_len` returns exactly what we expect, without throwing error as in case `1:nrow`: `dt[, print(a), by = seq_len(nrow(dt))]` or `dt[, sum(a), by = seq_len(nrow(dt))]`. For me `seq_len` looks safer. – jangorecki Apr 21 '15 at 15:42
  • 1
    @JanGorecki I'm simply pointing out that the `j-value` gets computed at all, which I don't think it should – eddi Apr 22 '15 at 02:06
0

You could try using transform(). For example, using a dummy dataset that I have:

> head(data)
   sample     time.min       abs time.sec
1:  pur n 0.0008333334 0.4678054     0.05
2:  pur n 0.2508333325 0.4661632    15.05
3:  pur n 0.5008333325 0.4663149    30.05
4:  pur n 0.7508333325 0.4658490    45.05
5:  pur n 1.0008333920 0.4671631    60.05
6:  pur n 1.2508333920 0.4657932    75.05

Let's say I want to sum the two "time" columns together, and fill a new column with that value. I could use transform() to do that:

> transform(data, time.sum = time.min + time.sec)
        sample     time.min       abs time.sec     time.sum
  1:     pur n 0.0008333334 0.4678054     0.05   0.05083333
  2:     pur n 0.2508333325 0.4661632    15.05  15.30083328
  3:     pur n 0.5008333325 0.4663149    30.05  30.55083328
  4:     pur n 0.7508333325 0.4658490    45.05  45.80083328
  5:     pur n 1.0008333920 0.4671631    60.05  61.05083691
dwong2107
  • 9
  • 3