0

I built a function for outliers detection and it worked quite well, but given the huge amount of data I'm working on I needed to remove the "for loop", so here we have the vectorized version (or at least what I think is a vectorized version of my code). Calling the function the following parameters are intialized by the user, I am working with the following:

alpha=3
gamma=0.5
k=5

The series "price" exist in the workspace, is linked when calling the function. I think I almost did it but I am stuck with a problem.
Here is a piece of the code:

[n] = size(price,1);
x = price;
[j1]=find(x); %output is a column vector with size (n,1) of the following form j1=[1:1:n]
matrix_left=zeros(n, k,'double');
matrix_right=zeros(n, k,'double');
toc
matrix_left(j1(k+1:end),:)=x(j1-k:j1-1);

%Here it returns the following error: Subscript indices must either be real positive integers or logicals.

matrix_right(j1(1:end-k),:)=x(j1+1:j1+k);

%Here, it says the following: Subscripted assignment dimension mismatch.

matrix_group=[matrix_left matrix_right];
trimmed_mean=trimmean(matrix_group,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(matrix_group,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1; 

What I'd like to have is something like:
if k= 5

left_matrix (3443,5):  
[100.25  103.5   102.25    102.75   103]  <---5 left neighbouring observations of the 15th row of **x**
[103.5   102.25  102.75    103    103.5]  <---5 left neighbouring observations of the 16th row of **x**

right_matrix(3443,5):  
[103.75  104.25  104   104.75  104.25]  <---5 right neighbouring observations of the 15th row of **x**
[104.25  104   104.75  104.25   104.5]  <---5 right neighbouring observations of the 16th row of **x**

Here is a small sample of data:

x = Price; price size = (3443, 1)
[...]  
100.25       %// '*suppose here we are at the 10th row*' 
103.5
102.25
102.75  
103  
103.5        %// '*here we are at the 15th row*'
103.75   
104.25  
104  
104.75  
104.25  
104.5  
[...]

Time (3443,1) %// the same as price, it reports the time of the transaction (HH:MM:SS).  
j1 (3443,1)
1  
2  
[...]  
3442  
3443  

Thanks everyone in advance,
Giorgio

Dan
  • 45,079
  • 17
  • 88
  • 157
CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56
  • You will get that error whenever `j1` is less than or equal to `k`, or less than of equal to `1` – Dan Feb 10 '14 at 09:43
  • @Dan do you refer to matrix_left? But in the way I wrote the code shouldn't it be bounded to j1(k+1:end). Particularly I chose k=5. If I understood correctly I should have that error if k=1. – CAPSLOCK Feb 10 '14 at 09:50
  • I'm referring to `...=x(j1-k:j1-1)`. If `k` is `5`and `j1` turns out to be `5` or less then you will get exactly that error – Dan Feb 10 '14 at 09:51
  • @Dan I defined j1 like this: [j1]=find(x); Thus, as output, I have a simple column vector of integer from (1:n). I get your point but I thought I was correctly bounding the index of matrix_left to start filling it up only from j1>k. – CAPSLOCK Feb 10 '14 at 09:58
  • It's impossible to say exactly what's happening without seeing `j1` (or else `price`) but you're problem is how you are indexing `x` in that line, not `matrix_left`. The error is coming from the right hand side of the equation – Dan Feb 10 '14 at 10:11
  • @Dan any Idea of how could I fix this? How can I be more clear? Can I attach a sample workfile I did with everything built up to the matrix_left creation? – CAPSLOCK Feb 10 '14 at 10:18
  • State your goals better - and please provide a small example sample input and output and describe in words how you get from the input to the output – Dan Feb 10 '14 at 10:44
  • @Dan hope it is clear now. I showed how I would like to compute the two matrices. – CAPSLOCK Feb 10 '14 at 12:10
  • Also see [this question](http://stackoverflow.com/questions/20054047/subscript-indices-must-either-be-real-positive-integers-or-logicals-generic-sol) for a [generic approach](http://stackoverflow.com/a/20054048/983722) to deal with the subscript indices error. – Dennis Jaheruddin Mar 10 '14 at 10:34

1 Answers1

0

Here is the answer, the way to go was (once again) bsxfun:

[n] = size(price,1);
x = price;

idxArray_left=bsxfun(@plus,(k+1:n)',-k:-1);
idxArray_fill_left=bsxfun(@plus,(1:k)',1:k);
matrix_left=[idxArray_fill_left; idxArray_left];
idxArray_right=bsxfun(@plus,(1:n-k)',1:k);
idxArray_fill_right=bsxfun(@plus,(n-k+1:n)',-k:-1);
matrix_right=[idxArray_right; idxArray_fill_right];      
idx_matrix=[matrix_left matrix_right];
neigh_matrix=x(idx_matrix);
trimmed_mean=trimmean(neigh_matrix,10,'round',2);
score=bsxfun(@minus,x,trimmed_mean);
sd=std(neigh_matrix,0,2);
temp = abs(score) > (alpha .* sd + gamma);
outmat = temp*1;  

Thanks to Jonas that gave me a great hint to solve the problem

CAPSLOCK
  • 6,243
  • 3
  • 33
  • 56