0

Below is my code for a neural network Forward propagation. I want to speed it up. As for loop takes time, Can any body help in correcting the code for speeding it up, like matlab says vectorzing etc. In this code i take receptive field of 4x4 each time from input of size 19x19, than multiply each pixel with 4x4 of weights (net.w{layer_no}(u,v) of size 19x19). You can also say it is a dot product of the two. I didnt did directly dot product of two small matrices as there is a check of boundaries. It provides a 6x6 output saved in output in the end. I am not an experienced coder, so i did as much as i can. Can anybody guide me how to speed it up as it takes alot of time compare to Opencv. Will be thankful. Regards

    receptiveSize = 4;
    overlap= 1;
    inhibatory = 0;
    gap = receptiveSize-overlap;

    UpperLayerSize = size(net.b{layer_no}); % 6x6
    Curr_layerSize = size(net.w{layer_no}); % 19x19

    for u=1:UpperLayerSize(1)-1
        for v=1:UpperLayerSize(2)-1

            summed_value=0;
            min_u = (u - 1) * gap + 1;
            max_u = (u - 1) * gap + receptiveSize;
            min_v = (v - 1) * gap + 1;
            max_v = (v - 1) * gap + receptiveSize;
            for i = min_u : max_u
                for j = min_v : max_v
                    if(i>Curr_layerSize(1) || j>Curr_layerSize(2))
                        continue;
                    end
                    if(i<1 || j<1)
                        continue;
                    end
                    summed_value = summed_value + input{layer_no}.images(i,j,sample_ind) * net.w{layer_no}(i,j);
                 end
            end
            summed_value = summed_value + net.b{layer_no}(u,v);
            input{layer_no+1}.images(u,v,sample_ind) = summed_value;
        end
    end
    temp = activate_Mat(input{layer_no+1}.images(:,:,sample_ind),net.AF{layer_no});
    output{layer_no}.images(:,:,sample_ind) = temp(:,:);
Shai
  • 111,146
  • 38
  • 238
  • 371
khan
  • 531
  • 6
  • 29

1 Answers1

1

How about replacing the inner loops (loop over i and loop over j) to something like:

ii = max( 1, min_u ) : min( max_u, Curr_layerSize(1) );
jj = max( 1, min_v ) : min( max_v, Curr_layerSize(2) );
input{layer_no+1}.images(u,v,sample_ind) = ...
    reshape( input{layer_no}.images(ii,jj,sample_ind), 1, [] ) * ...
    reshape( net.w{layer_no}(ii,jj), [], 1 ) + ...
    net.b{layer_no}(u,v); %// should this term be added rather than multiplied?
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 2
    Additional improvements would be to take `min` and `max` outside the loops: `min_u = [0:ULS(1)-2]*gap`;`max_u = min_u + receptiveSize`;`min_u = min_u + 1`; `min_v = min_u` and `max_v = max_u` (if one can guarantee ULS(1) = ULS(2) as mentioned in the question, generalizing would be straightforward though). And also `max( 1, min_u ) = min_u` always and `min( max_u, Curr_layerSize(1) )` can be determined prior the loops. The same for `v`. – PetrH Aug 14 '14 at 13:44
  • yes the last term should be added. let me try and see whether result is same or different – khan Aug 14 '14 at 13:45
  • Instead of that, I did like this, as that multiplication was having some problem, as i needed sum of all in the end, for all the dot multiplication, so i did some thing like this. As in my code for this situation this can work as well. And tried to use your idea. When i check time, than there is about 4 second reduction using this. input.images{layer_no+1}(u,v,sample_ind) = sum(sum(input.images{layer_no}(min_u:max_u,min_v:max_v,sample_ind) .* ... net.w{layer_no}(min_u:max_u,min_v:max_v))) + ... net.b{layer_no}(u,v); – khan Aug 14 '14 at 14:29
  • @khan does `sum( sum( ... ) )` works faster than my proposed vector dot product? – Shai Aug 14 '14 at 14:30
  • thats what my second question is,, i am worried about sum of sum, how to avoid. when i use your idea, in that the result was not the same. may be i did some mistake may be but when i use that old code result if old and new code was not the same – khan Aug 14 '14 at 14:33
  • 1
    No your code is correct, result is now the same. Let me just recheck time taken. Wait – khan Aug 14 '14 at 14:36
  • @Shai time taken before these changes were 38.939 seconds, when i did changes with sum(sum()), it became 34.102. but when i removed sum(sum() than it went again to 49.048 seocnds. pretty strange although it shouldnt be like this. But currently timer is showing me like that. – khan Aug 14 '14 at 14:58
  • @PetrH : i didnt understand what is ULS(1) means in your comment. How did you changed the whole farmula. – khan Aug 14 '14 at 15:02
  • @Shai yes your vectorization other than sum(sum()) is taking more time. i run it 2 times, and first it give me 49.048, than 51.167 and now finaly gave me 52.247. I am also running simulation on instance of matlab. So may be varation of time is due to that. But comapare to sum() code, it increased. – khan Aug 14 '14 at 15:12
  • @Shai Now there is another scenario, in which i get min_u, max_u, min_v and max_v in and out of minimum and maximum range of the matrix. i.e. if image size is 19x19, i get min_u =-1, max_u=20 some thing like this. How can i deal that. (where and how can i post the code here, as when i do it in comments section it give me More Characters message) SHould i add it in answer? – khan Aug 14 '14 at 16:52
  • 1
    @khan *"i didnt understand what is ULS(1) means"* `ULS` meant `UpperLayerSize` I wanted to shorten the already long comment. – PetrH Aug 14 '14 at 18:21