2

Matrix A is my starting matrix it holds the data logged from my MPU6050 and GPS on an SD Card (Latitude, Longitude, Time, Ax, Ay, Az, Gx,Gy,Gz).

I calculated the standard deviation of Az for window size of 5 and identified all the elements that satisfy a condition (>threshold).

Then in a matrix "large_windows" i stored the index of all the Az in the window that satisfy the condition.

From matrix "large_windows" i calculated a new matrix B with all the rows from matrix A that contain the matrix "large_windows" elements.

I think my code is effective, but very ugly and chaotic, plus i am still not very practical with indexing but i want to learn it.


1. Does a better solution exist?


2. It is possible to use a logic indexing? How? It is efficient*?

Here my code, is a simplified example, with generic condition, to understand the whole concept better not only my specific situation, starting from suggestions of a previous problem(how to create a sliding window

%random matix nXm
a=rand(100,6); 

%window dimension
window_size=4; 

%overlap between two windows
overlap=1;

%increment needed
step=window_size - overlap; 

%std threshold
threshold=0.3; 
std_vals= NaN(size(a,1),1); 

%The sliding window will analyze only the 5th column 
for i=1: step: (size(a,1)-window_size)
  std_vals(i)=std(a(i:(i+window_size-1),5));
end

% finding the rows with standard deviation larger than threshold
large_indexes = find(std_vals>threshold);

%Storing all the elements that are inside the window with std>threshold 

large_windows = zeros(numel(large_indexes), window_size);
for i=1:window_size
    large_windows(:,i) = large_indexes + i - 1;
end

% Starting extracting all the rows with the 5th column outlier elements 
n=numel(large_windows);

%Since i will work can't know how long will be my dataset 
%i need to knwo how is the "index distance" between two adjacent elements
% in the same row [es. a(1,1) and a(1,2)]


diff1=sub2ind(size(a),1,1);
diff2=sub2ind(size(a),1,2);

l_2_a_r_e = diff2-diff1 %length two adjacent row elements 
large_windows=large_windows'
%calculating al the index of the element of a ith row containing an anomaly
for i=1:n

   B{i}=[a(large_windows(i))-l_2_a_r_e*4 a(large_windows(i))-l_2_a_r_e*3 a(large_windows(i))-l_2_a_r_e*2 a(large_windows(i))-l_2_a_r_e*1 a(large_windows(i))-l_2_a_r_e*0 a(large_windows(i))+l_2_a_r_e];
end 

C= cell2mat(B');

I also read some question before posting it, but This was to specific

B is not included in A so this question is not helpful Find complement of a data frame (anti - join)

I don't know how to use ismember in this specific case

I hope my drawing could better explain my problem :)

Thanks for your time enter image description here

Max
  • 1,471
  • 15
  • 37
Andrea Ciufo
  • 359
  • 1
  • 3
  • 19
  • I'm not sure if I understand correctly what you are looking for. For example instead of your for-loop you could use `large_windows=repmat(large_indexes.',window_size,1)+(0:3).'` or `large_windows=bsxfun(@plus,large_indexes,0:3).'` to create your `large_windows`-array. It would probably be a little more efficient. Are you looking for this kind of stuff? Is your goal to make your code faster? Are you dealing with big amounts of data? Or are you just trying to pretify the code and understand some fancy matlab-indexing stuff? – Max May 30 '17 at 15:29
  • @uomodellamansarda If the final result is matrix B, you doesn't really need to calculate the matrix "large_windows". You can get "B" directly from "large_indexes". What do you think about that? – Amritbir Singh Gill May 30 '17 at 16:02
  • @Max my goal is make my code faster, because i have more than 4k rows, but i want to understand some fancy matlab-indexing stuff (are not they useful? I am a noob, with no computer science background and everyone discourage me to use for-loop on matlab) :) Thanks for your suggestion i will study and then try it :) <3 – Andrea Ciufo May 30 '17 at 16:34
  • @AmritbirSinghGill i did't thought about this possible solution, i will try! (**Do or do not there is no try**) – Andrea Ciufo May 30 '17 at 16:37
  • in the line where you are calculating `B` you are using the row-numbers which are corresponding to column 5 of your `a`-array as linear indexes. are you sure, that this is what you wanted to do? – Max May 30 '17 at 17:36
  • @Max My final goal is to get a matrix,in my example the **B** mat, **in which are stored all the rows of the original matrix A with the elements on the 5th column that exceed a threshold**. In this case the elements are analyzed into groups of five. Maybe with more tests it will be good to have a window size of 10. That's the reason why i struggle with this problem. Let me know if is clearer now what i want to do, unfortunately my english is terrible. I will study your code it will take a little bit of time for a beginer like me, **thanks for all :)** – Andrea Ciufo May 31 '17 at 10:47
  • @Max I understood that probably what is my goal is not what my code actually does :/ – Andrea Ciufo May 31 '17 at 10:48
  • Ok, I got what you want now, and I will fix that for you and give you some more insights on what you did wrong. I'll do it during the next couple of hours, no time right now... – Max May 31 '17 at 10:56
  • I just edited my answer and I think it should be what you wanted. If you need any further explanations or help, feel free to ask. – Max May 31 '17 at 15:59
  • @Max thanks a lot! I am going to study your solution and the post and the LuisMendo's post too :) – Andrea Ciufo Jun 01 '17 at 07:01
  • you're welcome! If it solves you problems please consider accepting my answer – Max Jun 01 '17 at 17:56
  • @Max done! :D I will try and study your solution in the following days, thank you for your time, in particulary identifying my mistake :) – Andrea Ciufo Jun 01 '17 at 18:00

1 Answers1

1

Here's a new approach to achieve the result that you actually wanted to achieve. I corrected 2 mistakes that you made and replaced all the for loops with bsxfun which is a very efficient function to do stuff like this. For Matlab R2016b or newer you can also implicit expansion instead of bsxfun.
My starts at you implementation of the sliding window. Instead of your for-loop, you can use

stdInds=bsxfun(@plus,1:step:(size(a,1)-overlap),(0:3).');
std_vals=std(a(sub2ind(size(a),stdInds,repmat(5,size(stdInds)))));

here. The bsxfun creates an array that holds the rows of your windows. It holds 1 windo in each column. These rows need to be transformed into linear index of the a-array in order to get an array of values, that can be passed to the std-function. In your implementation you made a small mistake here, because your for-loop ends at size(a,1)-window_size and should actually have ended at size(a,1)-overlap, because otherwise you are missing the last window.
Now that we got the std-values of the windows we can check which ones are greater than your predefined threshhold and then transform them back into the corresponding rows:

highStdWindows=find(std_vals_2>threshold);
highStdRows=bsxfun(@plus,highStdWindows*step-step+1,(0:3).');

highStdWindows contains the indexes of the windows, that have high-Std-values. In the next line, we calculate the starting rows of these windows using highStdWindows*step-step+1 and then we calculate the other rows that are corresponding to each window using the bsxfun again.
Now we get to the actual mistake in your code. This line right here

B{i}=[a(large_windows(i))-l_2_a_r_e*4 a(large_windows(i))-l_2_a_r_e*3 a(large_windows(i))-l_2_a_r_e*2 a(large_windows(i))-l_2_a_r_e*1 a(large_windows(i))-l_2_a_r_e*0 a(large_windows(i))+l_2_a_r_e];

does not do what you wanted it to do. Unfortunatly you missplaced a couple of brackets here. This way you take the large_windows(i)'th element of matrix a and substract 4*l_2_a_r_e from it. What you wanted to write was

B{i}==[a(large_windows(i)-l_2_a_r_e*4)  % and so on

This way you would substract the 4*l_2_a_r_e from the index that you pass to a. This would still be wrong, because in large_windows you stored row-numbers and not linear indexes corresponding to matrix a.
Nevertheless this can be achieved a lot easier using subscripted indexing instead of linear indexing:

rowList=reshape(highStdRows,1,[]);
C=a(rowList,:); % all columns (:) and from the rows in rowList

These two easy lines tell matlab to take all rows that are stored in highStdRows with all columns (expressed by the :). With this if there are two adjacent windows with high-Std-values you will get the overlapping rows twice. If you don't want that, you can use this code instead:

rowList=unique(reshape(highStdRows,1,[]));
C=a(rowList,:);

If you want to get further insides on how indexing in Matlab works take a look at LuisMendo's post about this topic.

Max
  • 1,471
  • 15
  • 37
  • thanks i understood all my mistakes! Now i am trying to understand this: `highStdWindows*step-step+1` Is clear how `bsxfun` works, is not clear why you should multiply for `*step` and then subtract `-step+1` :) I tried the code on matlab, to see the result, but i am stucked on this point. Thanks in advance – Andrea Ciufo Jun 02 '17 at 17:36
  • In `highStdWindows` we store the numbers of the windows that have high Std-Values. So now we need to calculate the corresponding row numbers. Imagine we have just for windows of the size 4 and with overlap one and the first, third and fourth window have high Std-Values. Now `highStdWindows` would contain the array `[1, 3, 4]`, but we need to calculate the rows, that these windows start with. Multiplying this vector with `step` and adding `-step+1` does exactly this. e.g. window 3 consists of `[7 8 9 10]` and we calculate the starting row with `3*step-step+1=7` and the others with `bsxfun` – Max Jun 02 '17 at 18:45