Speed up code to compare fields in a struct

Question

I have the struct Trajectories with field uniqueDate, dateAll, label: I want to compare the fields uniqueDate and dateAll and, if there is a correspondence, I will save in label a value from an other struct.

I have written this code:

for k=1:nCols
    for j=1:size(Trajectories(1,k).dateAll,1)
        for i=1:size(Trajectories(1,k).uniqueDate,1)
            if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))
                for z=1:24
                    if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1))
                        Trajectories(1,k).label(j)=s(1,k).places.all(z,i);
                    else if(Trajectories(1,k).dateAll(j,4)==z)&&(size(s(1,k).places.all,2)<size(Trajectories(1,k).uniqueDate,1))
                            for l=1:size(s(1,k).places.all,2)
                                Trajectories(1,k).label(l)=s(1,k).places.all(z,l);
                            end
                        end
                    end
                end
            end
        end
    end
end

E.g

Trajectories(1,4).dateAll=[1 2004 8 1 14 1 15 0 0 0 1 42 13 2;596 2004 8 1 16 20 14 0 0 0 1 29 12 NaN;674 2004 8 1 18 26 11 0 0 0 1 20 38 1;674 2004 8 2 10 7 40 0 0 0 14 26 5 3;674 2004 8 2 11 3 29 0 0 0 1 54 3 3;631 2004 8 2 11 57 56 0 0 0 0 30 8 2;1 2004 8 2 12 4 35 0 0 0 1 53 21 2;631 2004 8 2 12 52 58 0 0 0 0 20 36 2;631 2004 8 2 13 5 3 0 0 0 1 49 40 2;631 2004 8 2 14 0 20 0 0 0 1 56 12 2;631 2004 8 2 15 2 0 0 0 0 1 57 39 2;631 2004 8 2 16 1 4 0 0 0 1 55 53 2;1 2004 8 2 17 9 15 0 0 0 1 48 41 2];

Trajectories(1,4).uniqueDate= [2004 8 1;2004 8 2;2004 8 3;2004 8 4];

it runs but it's very very slow. How can I modify it to speed up?

Puuh, maybe give us a small set of input data (if possible by generating it from MATLAB so we could easily try ourselves). My first idea would be to rethink the whole data-layout. Maybe it'd be easier to use several single matrices instead of one big struct. — tim, May 05 '16 at 08:34

score 1 · Accepted Answer · answered May 05 '16 at 12:09

Let's work from the inside out and see where it gets us.

Step 1: Simplify your comparison condition:

if (~isempty(s(1,k).places))&&(Trajectories(1,k).dateAll(j,1)==Trajectories(1,k).uniqueDate(i,1))&&(Trajectories(1,k).dateAll(j,2)==Trajectories(1,k).uniqueDate(i,2))&&(Trajectories(1,k).dateAll(j,3)==Trajectories(1,k).uniqueDate(i,3))

becomes

if (~isempty(s(1,k).places)) && all( Trajectories(1,k).dateAll(j,1:3)==Trajectories(1,k).uniqueDate(i,1:3) )

Then we want to remove this from a for-loop. The "intersect" function is useful here:

[ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');

We now have a vector i1 of all rows in dateAll that intersect with uniqueDate.

Now we can remove the loop comparing z using a similar approach:

[iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);

We have to be careful about our indices here, using a subset of a subset.

This simplifies the code to:

for k=1:nCols
    if isempty(s(1,k).places)
        continue; % skip to the next value of k, no need to do the rest of the comparison
    end
    [ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');
    [iz iz1 iz2] = intersect(Trajectories(1,k).dateAll(i1,4),1:24);

    usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
    if (usescalarlabel)
        Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,i2(iz1));
    else
        % you will need to check this: I think here you were needlessly repeating this step for every match
        Trajectories(1,k).label(i1(iz1)) = s(1,k).places.all(iz,:); 
    end
end

But wait! That z loop is exactly the same as using indexing. So we don't need that second intersect after all:

for k=1:nCols
    if isempty(s(1,k).places)
        continue; % skip to the next value of k, no need to do the rest of the comparison
    end
    [ia i1 i2]=intersect(Trajectories(1,k).dateAll(:,1:3),Trajectories(1,k).uniqueDate(:,1:3),'rows');

    usescalarlabel = (size(s(1,k).places.all,2)>=size(Trajectories(1,k).uniqueDate,1);
    label_indices = Trajectories(1,k).dateAll(i1,4);
    if (usescalarlabel)
        Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,i2);
    else
        % you will need to check this: I think here you were needlessly repeating this step for every match
        Trajectories(1,k).label(label_indices) = s(1,k).places.all(label_indices,:);
    end
end

You'll need to check the indexing in this - I'm sure I've made a mistake somewhere without having data to test against, but that should give you an idea on how to proceed removing the loops and using vector expressions instead. Without seeing the data that's as far as I can optimise. You may be able to go further if you can reformat your data into a set of 3d matrices / cells instead of using structs.

I am suspicious of your condition which I have called "usescalarlabel" - it seems like you are mixing two data types. Also I would strongly recommend separating the dateAll matrices into separate "date" and "data" matrices as the row indices 4 onwards don't seem to be dates. Also the example you copy/pasted in seems to have an extra value at row index 1? In that case you'll need to compare Trajectories(1,k).dateAll(:,2:4) instead of Trajectories(1,k).dateAll(:,1:3).

Good luck.

Speed up code to compare fields in a struct

1 Answers1