Benchmark
Starting off with Dev-iL's benchmark, I added OP's find
method and rahnema1's ismember
method, and look at how the methods scale with data size (number of keys). I also compare two different use cases: finding 5000 keys at once, and finding 5000 keys one at the time.
These are the results:
One key at the time Many keys at once
------------------------------------- -------------------------------------
size sparse c.Map find ismember sparse c.Map find ismember
---- ------- ------- ------- ------- ------- ------- ------- -------
50 5.1681 54.3091 3.7766 28.8590 0.0956 1.2973 0.5578 0.0537
500 5.0864 54.7872 6.9310 32.5554 0.0977 1.6847 3.6726 0.0499
5000 5.2052 56.4472 35.1449 60.6480 0.1140 2.0886 38.7444 0.0789
[Timings on MATLAB R2017a, on a 3 year old iMac. Your mileage will vary.]
Contrary to my expectations, containers.Map
has a lot of overhead and is not really suitable for this purpose. Even with an array of 5000 keys, the O(n) find
method is actually faster than the O(log n) hash map. containers.Map
is a custom class, and the MATLAB JIT is still not as good with optimizing that type of code. However, one can clearly see the effect of the scaling there, as the find
method is the only one whose run time increases significantly with increasing data sizes.
Interestingly, the "sparse" method is ~50x faster when vectorized. This is usually not the case any more with vectorization. For example the find
method is only about 1x-2x faster when vectorized (for larger data sizes, the vectorization requires too much memory, and eventually will prove to be very slow).
The largest difference here between vectorized and loop code is for the ismember
function. This one sorts the input data, and so here we see the difference between doing that once, and doing it 5000 times. This method is really only suitable when called few times. But in that case, ismember
also is easily the fastest method.
When fetching one key at the time, the sparse method is fastest, unless the data size is very small, in which case the find
method wins. However, the sparse method is the only one that requires keys to be positive integers (it will not work with 0, negative values, or non-integer values). The other methods all work with values of arbitrary types (including strings).
Benchmark code:
function t = so(N)
% Define the mapping(s):
keys = (1:N).*5; % Keys must be positive integers!
values = 1:N;
% Sparse lookup table
sparseMap = sparse(ones(numel(keys),1), keys, values);
% containers.Map lookup table
hashMap = containers.Map(keys, values);
% Try this out:
queryKeys = keys(randi(numel(keys),5000,1));
queryKeysCell = num2cell(queryKeys); % trick to read many values from the hashMap at once
t = [timeit(@f1,1), timeit(@f2,1), timeit(@f3,1), timeit(@f4,1), ...
timeit(@f1q,1), timeit(@f2q,1), timeit(@f3q,1), timeit(@f4q,1)] * 1000;
% Functions that do the lookup one at the time:
function queryVals = f1
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = full(sparseMap(queryKeys(ii)));
end
end
function queryVals = f2
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = hashMap(queryKeys(ii));
end
end
function queryVals = f3
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
queryVals(ii) = find(keys==queryKeys(ii));
end
end
function queryVals = f4
queryVals = zeros(size(queryKeys));
for ii=1:numel(queryKeys)
[~, queryVals(ii)] = ismember(queryKeys(ii), keys);
end
end
% Functions that do the lookup all at once:
function queryVals = f1q
queryVals = reshape(full(sparseMap(queryKeys)), size(queryKeys));
end
function queryVals = f2q
queryVals = hashMap.values(queryKeysCell);
end
function queryVals = f3q
[queryVals,~] = find(keys.'==queryKeys);
end
function queryVals = f4q
[~,queryVals] = ismember(queryKeys, keys);
end
end