Is there built-in functionality for this?
4 Answers
GNU Octave search a cell array of strings in linear time O(n)
:
(The 15 year old code in this answer was tested and correct on GNU Octave 3.8.2
, 5.2.0
and 7.1.0
)
The other answer has cellidx
which was depreciated by octave, it still runs but they say to use ismember
instead, like this:
%linear time string index search.
a = ["hello"; "unsorted"; "world"; "moobar"]
b = cellstr(a)
%b =
%{
% [1,1] = hello
% [2,1] = unsorted
% [3,1] = world
% [4,1] = moobar
%}
find(ismember(b, 'world')) %returns 3
ismember finds 'world' in index slot 3. This is a expensive linear time O(n) operation because it has to iterate through all elements whether or not it is found.
To achieve a logarathmic time O(log n) solution, then your list needs to come pre-sorted and then you can use binary search:
If your cell array is already sorted, you can do O(log-n)
worst case:
function i = binsearch(array, val, low, high)
%binary search algorithm for numerics, Usage:
%myarray = [ 30, 40, 50.15 ]; %already sorted list
%binsearch(myarray, 30, 1, 3) %item 30 is in slot 1
if ( high < low )
i = 0;
else
mid = floor((low + high) / 2);
if ( array(mid) > val )
i = binsearch(array, val, low, mid-1);
elseif ( array(mid) < val )
i = binsearch(array, val, mid+1, high);
else
i = mid;
endif
endif
endfunction
function i = binsearch_str(array, val, low, high)
% binary search for strings, usage:
%myarray2 = [ "abc"; "def"; "ghi"]; #already sorted list
%binsearch_str(myarray2, "abc", 1, 3) #item abc is in slot 1
if ( high < low )
i = 0;
else
mid = floor((low + high) / 2);
if ( mystrcmp(array(mid, [1:end]), val) == 1 )
i = binsearch(array, val, low, mid-1);
elseif ( mystrcmp(array(mid, [1:end]), val) == -1 )
i = binsearch_str(array, val, mid+1, high);
else
i = mid;
endif
endif
endfunction
function ret = mystrcmp(a, b)
%this function is just an octave string compare, its behavior follows the
%strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
%that is:
% -returns 1 if string a > b
% -returns 0 if string a == b
% -return -1 if string a < b
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
a_as_string = a;
if iscellstr( a )
a_as_string = a{1}; %a was passed as a single-element cell array.
endif
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
b_as_string = b;
if iscellstr( b )
b_as_string = b{1}; %b was passed as a single-element cell array.
endif
% Space-pad the shortest word so as they can be used with gt() and lt() operators.
if length(a_as_string) > length( b_as_string )
b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
elseif length(a_as_string) < length( b_as_string )
a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
endif
letters_gt = gt(a_as_string, b_as_string); %list of boolean a > b
letters_lt = lt(a_as_string, b_as_string); %list of boolean a < b
ret = 0;
%octave makes us roll our own string compare because
%strings are arrays of numerics
len = length(letters_gt);
for i = 1:len
if letters_gt(i) > letters_lt(i)
ret = 1;
return
elseif letters_gt(i) < letters_lt(i)
ret = -1;
return
endif
end;
endfunction
%Assuming that myarray is already sorted, (it must be for binary
%search to finish in logarithmic time `O(log-n))` worst case, then do
myarray = [ 30, 40, 50.15 ]; %already sorted list
binsearch(myarray, 30, 1, 3) %item 30 is in slot 1
binsearch(myarray, 40, 1, 3) %item 40 is in slot 2
binsearch(myarray, 50, 1, 3) %50 does not exist so return 0
binsearch(myarray, 50.15, 1, 3) %50.15 is in slot 3
%same but for strings:
myarray2 = [ "abc"; "def"; "ghi"]; %already sorted list
binsearch_str(myarray2, "abc", 1, 3) %item abc is in slot 1
binsearch_str(myarray2, "def", 1, 3) %item def is in slot 2
binsearch_str(myarray2, "zzz", 1, 3) %zzz does not exist so return 0
binsearch_str(myarray2, "ghi", 1, 3) %item ghi is in slot 3
To sort your array if it isn't already:
Complexity of sorting depends on the kind of data you have and whatever sorting algorithm GNU octave language writers selected, it's somewhere between O(n*log(n))
and O(n*n)
.
myarray = [ 9, 40, -3, 3.14, 20 ]; %not sorted list
myarray = sort(myarray)
myarray2 = [ "the"; "cat"; "sat"; "on"; "the"; "mat"]; %not sorted list
myarray2 = sortrows(myarray2)
Code buffs to make this backward compatible with GNU Octave 3. 5. and 7. goes to @Paulo Carvalho in the other answer here.

- 146,994
- 96
- 417
- 335
-
1I posted a working version of `mystrcmp()` as an answer since there are major changes to the original code. – Paulo Carvalho May 31 '22 at 17:34
-
1Above code was improved to be correct and compatible with octave version 3, 5, and 7. You can remove your note that it's broken on octave 7.1 – Eric Leschinski Jun 02 '22 at 15:36
Yes check this: http://www.obihiro.ac.jp/~suzukim/masuda/octave/html3/octave_36.html#SEC75
a = ["hello"; "world"];
c = cellstr (a)
⇒ c =
{
[1,1] = hello
[2,1] = world
}
>>> cellidx(c, 'hello')
ans = 1
>>> cellidx(c, 'world')
ans = 2

- 126
- 1
- 4
-
1Works great. No need to sort. Returns `0` if not found. I wonder about efficiency...Is it a brute force search? – B Seven Nov 25 '11 at 18:20
-
1It appears that it is. If you use the 'which' function and pass a function name, it will point you to the source file. In this case, 'which cellidx'. If you look at the source, you'll see that it does do a brute-force search. – Michael Bray Oct 03 '12 at 02:04
-
See my answer (added long after the question was asked) for non brute-force options. – mc0e Mar 15 '15 at 16:44
-
The cellidx solution does not meet the OP's efficiency requirement, and is deprecated (as noted by help cellidx
).
Håvard Geithus in a comment suggested using the lookup() function on a sorted cell array of strings, which is significantly more efficient than cellidx. It's still a binary search though, whereas most modern languages (and even many 20 year old ones) give us easy access to associative arrays, which would be a much better approach.
While Octave doesn't obviously have associated arrays, that's effectively what the interpreter is using for ocatve's variables, including structs, so you can make us of that, as described here: http://math-blog.com/2011/05/09/associative-arrays-and-cellular-automata-in-octave/
Built-in Function: struct ("field", value, "field", value,...)
Built-in Function: isstruct (expr)
Built-in Function: rmfield (s, f)
Function File: [k1,..., v1] = setfield (s, k1, v1,...)
Function File: [t, p] = orderfields (s1, s2)
Built-in Function: fieldnames (struct)
Built-in Function: isfield (expr, name)
Function File: [v1,...] = getfield (s, key,...)
Function File: substruct (type, subs,...)
Converting Matlab to Octave is there a containers.Map equivalent? suggests using javaObject("java.util.Hashtable"). That would come with some setup overhead, but would be a performance win if you're using it a lot. It may even be viable to link in some library written in C or C++? Do think about whether this is a maintainable option though.
Caveat: I'm relatively new to Octave, and writing this up as I research it myself (which is how I wound up here). I haven't yet run tests on the efficiency of these techniques, and while I've got a fair knowledge of the underlying algorithms, I may be making unreasonable assumptions about what's actually efficient in Octave.
This is a version of mystrcmp()
that works in Octave of recent version (7.1.0):
function ret = mystrcmp(a, b)
%this function is just an octave string compare, its behavior follows the
%strcmp(str1,str2)'s in C and java.lang.String.compareTo(...)'s in Java,
%that is:
% -returns 1 if string a > b
% -returns 0 if string a == b
% -return -1 if string a < b
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
a_as_string = a;
if iscellstr( a )
a_as_string = a{1}; %a was passed as a single-element cell array.
endif
% The gt() operator does not support cell array. If the single word
% is passed as an one-element cell array, converts it to a string.
b_as_string = b;
if iscellstr( b )
b_as_string = b{1}; %b was passed as a single-element cell array.
endif
% Space-pad the shortest word so as they can be used with gt() and lt() operators.
if length(a_as_string) > length( b_as_string )
b_as_string( length( b_as_string ) + 1 : length( a_as_string ) ) = " ";
elseif length(a_as_string) < length( b_as_string )
a_as_string( length( a_as_string ) + 1 : length( b_as_string ) ) = " ";
endif
letters_gt = gt(a_as_string, b_as_string); %list of boolean a > b
letters_lt = lt(a_as_string, b_as_string); %list of boolean a < b
ret = 0;
%octave makes us roll our own string compare because
%strings are arrays of numerics
len = length(letters_gt);
for i = 1:len
if letters_gt(i) > letters_lt(i)
ret = 1;
return
elseif letters_gt(i) < letters_lt(i)
ret = -1;
return
endif
end;
endfunction

- 554
- 5
- 10
-
I replaced the trimming of the longest word with the space-padding of the shortest word. This is a safer way to perform the comparisons. – Paulo Carvalho May 31 '22 at 18:27
-
1Nice, I borrowed your code here to make my answer above compatible with GNU Octave 3, 5, and 7. Have an upboat. – Eric Leschinski Jun 01 '22 at 15:43