Row wise operation on data.table

Question

Let's say I'd like to calculate the magnitude of the range over a few columns, on a row-by-row basis.

set.seed(1)
dat <- data.frame(x=sample(1:1000,1000),
                  y=sample(1:1000,1000),
                  z=sample(1:1000,1000))

Using data.frame(), I would do something like this:

dat$diff_range <- apply(dat,1,function(x) diff(range(x)))

To put it more simply, I'm looking for this operation, over each row:

diff(range(dat[1,]) # for i 1:nrow(dat)

If I were doing this for the entire table, it would be something like:

setDT(dat)[,diff_range := apply(dat,1,function(x) diff(range(x)))]

But how would I do it for only named (or numbered) rows?

The question sounds like all you want to do is subset the data frame or data table, but based on your profile you know how to do that already. What are you actually trying to achieve here? — JeremyS, Jan 22 '14 at 09:26
I think I was under the impression that I could use notation in the `apply()` expression akin to how columns are refrerenced with data.table. This, does what I expect: `dt[,diff_range := apply(dt[,1:2,with=FALSE]...` but I thought there was some magic that I could do something like: `apply(1:2, ...)`. I suppose I answered my own question here. — Brandon Bertelsen, Jan 22 '14 at 16:03
Oh yes, you can, but not with data table that way since it changes dt instead of making a copy. I added an answer with the way I use most often `%in%` — JeremyS, Jan 24 '14 at 00:49

score 5 · Answer 1 · answered Jan 22 '14 at 07:19

How about this:

D[,list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I][c(1:4,15:18)]
#    I  V1
#1:  1 971
#2:  2 877
#3:  3 988
#4:  4 241
#5: 15 622
#6: 16 684
#7: 17 971
#8: 18 835

#actually this will be faster
D[c(1:4,15:18),list(I=.I,x,y,z)][,diff(range(x,y,z)),by=I]

use .I to give you an index to call with the by= parameter, then you can run the function on each row. The second call pre-filters by any list of row numbers, or you can add a key and filter on that if your real table looks different.

but this solution only works if you explicitly specify the name of every column, it won't work if there are too many or you don't know it — skan, Feb 25 '16 at 20:25

score 5 · Accepted Answer · answered Mar 14 '16 at 15:33

pmax and pmin find the min and max across columns in a vectorized way, which is much better than splitting and working with each row separately. It's also pretty concise:

dat[, r := do.call(pmax,.SD) - do.call(pmin,.SD)]


        x   y   z   r
   1: 266 531 872 606
   2: 372 685 967 595
   3: 572 383 866 483
   4: 906 953 437 516
   5: 201 118 192  83
  ---                
 996: 768 945 292 653
 997:  61 231 965 904
 998: 771 145  18 753
 999: 841 148 839 693
1000: 857 252 218 639

JeremyS · Answer 3 · 2014-01-22T06:13:50.403

0

You can do it by subsetting before/during the function. If you only want every second row for example

dat_Diffs <- apply(dat[seq(2,1000,by=2),],1,function(x) diff(range(x)))

Or for rownames 1:10 (since their names weren't specified they are just numbers counting up)

dat_Diffs <- apply(dat[rownames(dat) %in% 1:10,],1,function(x) diff(range(x)))

But why not just calculate per row then subset later?

edited Jan 22 '14 at 06:13

answered Jan 22 '14 at 06:07

JeremyS

3,497
1
17
19

Row wise operation on data.table

3 Answers3

Linked

Related