r matrix individual shift operations of elements

Question

I am trying to optimize some code that I have written as it is very slow for large datasets. I am not sure if the following can be done with matrix operations and I would appreciate if someone had any suggestions to make it faster.

I have a matrix with zeros and integers and I would like to shift down the entries of the individual columns by the absolute number of the integer in the the entry.

   [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0   -4    0
[3,]    4    0    0
[4,]   -3   -2    0
[5,]    0    2   -1
[6,]    2   -2    0
[7,]    0    0    0
[8,]   -3   -3    0

The code I am using is the following:

#data
A<-matrix(data=c(0,0,4,-3,0,2,0,-3,0,-4,0,-2,2,-2,0,-3,0,0,0,0,-1,0,0,0),nrow=8,ncol=3)

#shift function
shift<-function(x)
{
  #create the output matrix
  out<-matrix(data=0,nrow=8,ncol=1)

  #for loop to create the shift matrix
  for(i in seq(1,8,by=1))
  {
    if(i+abs(x[i])<=8)
    {
      #find the non zero
      if(x[i]!=0)
      {
        #if there is already a number put zero  
        if(out[i+abs(x[i]),1]!=0)
        {
          out[i+abs(x[i]),1]=0
        } else {
          #shift
          out[i+abs(x[i]),1]=x[i]
        }
      }
    }
  }

  #return object
  return(out)
}

#run the logic
shift_mat<-sapply(1:ncol(A),FUN=function(k) shift(A[,k]))

and the result is:

   [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0
[4,]    0    0    0
[5,]    0    0    0
[6,]    0    0   -1
[7,]    0    2    0
[8,]    2   -2    0

The rules are the following for every column:

starting from the top find first entry that is different than zero
shift down by the absolute numbers of that entry
if there is another entry at the targeted point put zero
repeat for the next column

Thanks,

Nikos

I do not understand step 3 (if there is another entry at the targeted point put zero)... — Ferdinand.kraft, Aug 16 '13 at 20:58
for step 3: A[3,1] and A[4,1] they are both pointing to A[7,1], when this happens cancel both. — user2493820, Aug 16 '13 at 21:01
Lets say you do this. What's the *next* step? I am just wondering if there is an easier/better way to achieve your eventual desired output given your input data? — Simon O'Hanlon, Aug 16 '13 at 21:15
"shift down by the absolute numbers of that entry".. what does this mean? — Arun, Aug 16 '13 at 21:41

flodel · Accepted Answer · 2013-08-16T21:47:47.027

2

This is a bit cleaner and about 40% faster using your example on my machine. Maybe the speed improvement will be greater using your larger data?

You should use a matrix of integers. It uses less memory and some operations are faster:

A <- matrix(as.integer(c(0,0,4,-3,0,2,0,-3,0,-4,0,-2,2,
                        -2,0,-3,0,0,0,0,-1,0,0,0)), nrow = 8, ncol = 3)

Each column is a vector, so should be your output. I replaced matrices with vectors. Also made your code more robust without the hardcoded number of rows:

shift <- function(x) {
  n <- length(x)
  y <- rep(0L, n)
  for(i in seq_len(n)) {
    if (x[i] == 0L) next
    j <- i + abs(x[i])
    if (j > n) next
    y[j] <- if (y[j] != 0L) 0L else x[i]
  }
  return(y)
}

You can run it using apply:

shift_mat <- apply(A, 2, shift)

edited Aug 16 '13 at 21:47

answered Aug 16 '13 at 21:26

flodel

87,577
21
185
223

@Peyton. Thx. I'm not sure there is a choice though, as the order of the operations is important here. This is the kind of algorithm that would greatly benefit from a Rcpp rewrite. – flodel Aug 16 '13 at 21:33
@flodel thanks for the suggestions. I guess I would have to look at the Rcpp package. this function is part of a bigger program that has a 4000x200 matrix and takes a lot of time to compute – user2493820 Aug 16 '13 at 21:39
@user2493820: I just tested with `A <- matrix(sample(-2:2, 4000*200, replace = TRUE), 4000, 200)` and it took 5 seconds. Is this too long for your need? – flodel Aug 16 '13 at 21:40
the program runs a portfolio and not sure how easy would be to isolate this function... I would try to check against the total time. but the Rcpp suggestion is good. the function has a reduced form from the one I am actually using – user2493820 Aug 16 '13 at 21:44
@flodel there was a huge improvement on the time of the calculation. Mainly due to the change you made from sapply to apply. Thanks again – user2493820 Aug 17 '13 at 00:19

eddi · Answer 2 · 2013-08-16T22:38:37.203

2

The shift operation can be vectorized. Let's just take the first column of your data to see how:

v = c(0,0,4,-3,0,2,0,-3)

# index of the elements that could be non-zero in the final result
index = ifelse (v != 0 & abs(v) + seq_along(v) <= length(v),
                abs(v) + seq_along(v), 0)
# [1] 0 0 7 7 0 8 0 0


# now just need to filter out the duplicated entries
index = ave(index, index, FUN = function(x) {if (length(x) > 1) 0 else x})
# [1] 0 0 0 0 0 8 0 0

# home at last
res = integer(length(v))
res[index] = v[which(index != 0)]
res
# [1] 0 0 0 0 0 0 0 2

You can then put then above into a function and then lapply over your data.frame or apply on the columns of your matrix.

Unsurprisingly the biggest bottleneck above is the ave function, and you can replace that line with the following data.table construct (don't forget to require(data.table) somewhere) to speed it up considerably:

index = data.table(index)[, index := if(.N > 1) 0 else index, by = index][, index]

edited Aug 16 '13 at 22:38

answered Aug 16 '13 at 22:31

eddi

49,088
6
104
155

Consider `x <- c(3, 2, 1, 0)` and maybe you'll agree that `FUN` should rather be `function(x) {if (length(x) %% 2L) tail(x, 1) else 0L}`. Otherwise, great answer. Took me a while to agree it can be vectorized. – flodel Aug 17 '13 at 00:19
@flodel sorry I don't get it - can you point out what the problem with that `x` is? (also is that `x` supposed to be `index` or `v` in the above) I don't get why I'd want to compute length mod 2 here? – eddi Aug 17 '13 at 06:23
@flodel oh I think I get what you mean - I think this comes down to how I understand what OP wrote, but I do see your point - my understanding is - if there is any conflict at that point, put a zero, and yours seems to be - if there is sequential conflict only, put a zero – eddi Aug 17 '13 at 06:26

r matrix individual shift operations of elements

2 Answers2

Linked