0

I have 2 matrices with the SAME IDs. I need to extract those rows of IDs from mat1 which have their dates within say ±5 days of the dates in the mat2. Same operation for mat2 as well. Please see the data here: UNIQCols = [1 2] ; dateCol = [3] ; valueCol = [4] ; dayRange = +- 15days.

      % UniqCol  Date    Value
mat1 = [2001 2   733427  1001 ;
        2001 2   733793  2002 ;
        2001 2   734582  2003 ;
        3001 1   734220  30   ;
        3001 1   734588  20   ;];
mat2 = [2001 2   733790  7777 ;
        2001 2   734221  2222 ; 
        3001 1   734220  10   ; 
        3001 1   734588  40   ;] ;

ans1 = [2001 2 733793 2002 ; 3001 1 734220 30 ; 3001 1 734588 20 ] ;
ans2 = [2001 2 733790 7777 ; 3001 1 734220 10 ; 3001 1 734588 40 ] ;

This needs to be a vectorized operation! The IDs are ordered in increasing order of dates. Dates are either separated on Q or Annual basis. So the range will be always << (date2-date1) Please help and thanks!

Maddy
  • 2,520
  • 14
  • 44
  • 64
  • Are you restricted to matlab for this task? Matlab is meant for matrix manipulation; this is working with dates that happen to be in matrix form. You may be better served working in other scripting languages. – eykanal Mar 22 '11 at 16:45
  • So, you need to compare each row in `mat1` to every row in `mat2` to see whether they're within ±5 days of each other? – eykanal Mar 22 '11 at 16:49
  • sorry, one last request... can you please edit the question to give labels to the four columns? I'm not sure what's what. – eykanal Mar 22 '11 at 17:12
  • The problem is very similar to this my question: http://stackoverflow.com/questions/2142826/mapping-2-vectors-help-to-vectorize Please have a look. It has been shown that the solution with loops works faster and consumes less memory than vectorized solution. Note that the data need to be sorted. – yuk Mar 22 '11 at 20:21
  • Check the code in accepted answer. You just need to change the condition to break the loop. I believe it will work even faster in your case. – yuk Mar 22 '11 at 20:28
  • Actually your matrices are not ordered by date. Check rows 3 and 4 in mat1, and rows 2 and 3 in mat2. – yuk Mar 22 '11 at 21:17

1 Answers1

0

Here is a function based on similar question I mentioned in my comments. Remember your matrices has to be sorted by date.

function match_for_xn = match_by_distance(xn, xm, maxdist)
%#Generates index for elements in vector xn that close to any of elements in
%#vector xm at least by distance maxdist

match_for_xn = false(length(xn), 1);
last_M = 1;
for N = 1:length(xn)
  %# search through M until we find a match.
  for M = last_M:length(xm)
    dist_to_curr = xm(M) - xn(N);
    if abs(dist_to_curr) < maxdist
        match_for_xn(N) = 1;
        last_M = M;
        break
    elseif dist_to_curr > 0
        last_M = M;
        break
    else
      continue
    end

  end %# M
end %# N

And the test script:

mat1 = sortrows([
        2001 2   733427  1001 ;
        2001 2   733793  2002 ;
        2001 2   734582  2003 ;
        3001 1   734220  30   ;
        3001 1   734588  20   ;
       ],3);
mat2 = sortrows([
        2001 2   733790  7777 ;
        2001 2   734221  2222 ; 
        3001 1   734220  10   ; 
        3001 1   734588  40   ;
       ],3);

mat1_index = match_by_distance(mat1(:,3),mat2(:,3),5);
ans1 = mat1(mat1_index,:);
mat2_index = match_by_distance(mat2(:,3),mat1(:,3),5);
ans2 = mat2(mat2_index,:);

I haven't tried any vectorized solution for your problem. If you get any try it against this solution and check the timing and memory consumption (include sorting step).

Community
  • 1
  • 1
yuk
  • 19,098
  • 13
  • 68
  • 99
  • Hi Yuk..thanks for the response. This code does matches the dates within the range, but these dates need to be checked within a sub-group of IDs (col 1 and 2). Simply put, say you take rows of AAPL from mat1 and search the dates in rows of AAPL in mat2. The row count can be different. Can you please suggest using accumarray to do this? – Maddy Mar 29 '11 at 21:00