2

I have a function that takes one vector as its input, uses another function to create a derivative vector from the input, and then compares the two vectors to produce its output vector. I currently have it working with a for loop as follows:

function [parentIndexVec] = computeParentIndex(nameVec)

    parentNameVec = computeParentName(nameVec);
    for i=1:length(parentNameVec)
        parentIndexVec(i) = find(strcmp(nameVec, parentNameVec{i}));
    end

end

The computeParentName function essentially returns a copy of nameVec with its last letter removed. The cell arrays preceding the loop then appear as follows:

nameVec       = ''    'a'    'b'    'aa'    'ab'    'ba'    'aba'    'abb'
parentNameVec = ''    ''     ''     'a'     'a'     'b'     'ab'     'ab'

The goal of this function is to find the indices of where each element in parentNameVec appears in nameVec, and its output is thus as follows:

parentIndexVec = 1     1     1     2     2     3     5     5

I attempted to make a cellfun to accomplish this, but was unable to get it to operate as the two vectors must be compared at each point.

My questions are as follows:

  1. Is there a way to do this by eliminating the loops?
  2. Is it truly faster to have matrix operations rather than loops in most cases?
  3. If so, does cellfun compare in speed to pure matrix operations or would it be as slow as a loop?

Thanks for any assistance!

teepee
  • 2,620
  • 2
  • 22
  • 47

1 Answers1

1
  1. You can use ismember to find the occurrences of the strings in parentNameVec within nameVec

    nameVec = {''    'a'    'b'    'aa'    'ab'    'ba'    'aba'    'abb'};
    parentNameVec = {''    ''     ''     'a'     'a'     'b'     'ab'     'ab'};
    
    [~, parentIndexVec] = ismember(parentNameVec, nameVec)
    %   1   1   1   2   2   3   5   5
    
  2. For matrix operations, the operation is almost certainly going to be faster than the for loop. The differential between the two methods has decreased over time but it still exists. Unfortunately, in your examples you are using cell arrays which don't have matrix operations.

  3. cellfun is almost always slower than a for loop because MATLAB's JIT compiler is better able to optimize the contents of a for loop. This is particularly true in newer versions of MATLAB (R2015b+) in which the execution engine was reworked and provides much better acceleration.

All of that being said, a built-in function is almost always going to be superior to your own implementation of an algorithm (for loop or otherwise) because it has been optimized by the Mathworks to yield decent performance, robust error checking, and sometimes it is implemented at a lower level.

Suever
  • 64,497
  • 14
  • 82
  • 101
  • I really appreciate the response. Exactly the function I was looking for! So, I actually already went through my other sub functions to replace `for` loops with `cellfun`, so I may have unwittingly decreased its efficiency. Can most operations completed with `cellfun` be done in a faster, matrix operation way? For example, the sub function mentioned above (`computeParentName`) uses cellfun to remove the last letter of each element from the `nameVec` cell array. Is there a simpler way to do such a thing? – teepee Mar 08 '17 at 21:21
  • @teepee It really depends upon the specifics of what operation you're trying to perform. There are a lot of operations for operating on cell arrays of strings built into MATLAB. To remove the last character I would do something like: `regexprep(nameVec, '.$', '')` which simply replaces the last character with an empty string. – Suever Mar 08 '17 at 21:52
  • Thanks, I hadn't thought about using regex in MATLAB. Very nice that they can handle both chars and cellstr. That obviates the need for a lot of code I wrote to handle those distinctions. While I'm at it here, would you happen to know a cellstr function that returns the length of each string in the cell array? i.e. `lengthArray = foo({'aaa' 'bb' 'a'})` which returns `[3 2 1]`? – teepee Mar 08 '17 at 23:40
  • @teepee Your best bet there is going to be `cellfun`: `cellfun(@numel, array)` – Suever Mar 08 '17 at 23:41
  • OK thanks muchly, that's what I had in there. I was hoping for something a bit better. Although, I hear that cellfun is not bad for built-in functions, as opposed to udf's (is that correct?) My biggest issue with using cellfun is that it won't handle smoothly if it takes in a single string value instead of a cellstr. Do you have tips for handling that sort of thing? – teepee Mar 09 '17 at 00:17
  • @teepee Yes that's generally true. [Here's](http://stackoverflow.com/questions/18284027/cellfun-versus-simple-matlab-loop-performance) a post that details that. – Suever Mar 09 '17 at 00:19
  • Wow, that's actually quite startling, some of those findings. Apparently `cellfun(@isempty, array)` is significantly slower than `cellfun('isempty', array)` for some reason. That's an issue indeed. – teepee Mar 09 '17 at 01:00