Efficient one-to-one mapping between value and index in an array

Question

I need to get the index (i.e. position) of a value in an array, and I am wondering whether there is some faster way than using the find command, by constructing some sort of map or look-up table which contains the mapping between the array values and the indexes.

Take, for example, this array:

th = [0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90];

Now, let's say I have a variable with the value of

angle = 55

and I want to know where this value is positioned in the array (correct answer is idx = 12). Now, I could of course just use find:

idx = find(th==angle)

But my problem is that in my code I need to do this look-up, to get the index in th for the value in angle, several (million) times, and it seems like a bit of a waste of resources to constantly be calling the find function, which I guess is looping through th and doing some sort of comparison.

Instead, I was hoping that there was some way to set up a one-to-one map or a look-up table, where I can just immediately fetch the index corresponding to the value I have in angle. (Note: I know that the value I have in angle will always correspond exactly to one of the values in th.) So just have some function

idx = angle2i(angle)

which performs this mapping:

0 -> 1
5 -> 2
10 -> 3
15 -> 4
20 -> 5
25 -> 6

etc.

But I'm not seeing how I should implement such a look-up (well, I have a couple of very non-elegant ideas, I'm hoping and guessing that there must be some smart approach for this). Or am I wasting my time here and should I just use the find command?

well you already have a mapping function, which according to your example `th` I would write as `angle2i = @(angle) (angle/5)+1;`. I don't know if that's faster than `find` (you'd have to profile your code to estimate). Where you could potentially save a lot of time is if it could be vectorised, meaning if you already have all the angle values to look up in an array, then with the mapping function calculate an array of all the relevant indices in one operation, then in your loop just pick the index from this array (no need repeated calls to your mapping function or to `find`. — Hoki, Apr 17 '19 at 10:56
Hey @Hoki, thanks for the reply. You're right for this particular case, but I was hoping for a general procedure, when there is no apparent relationship between the values and the indexes. — Finnur Pind, Apr 17 '19 at 12:22
if there is no calculable relationship between the values and the indexes then your best option is to use a `map` as shown in [Chris Luengo's answer](https://stackoverflow.com/questions/55725607/matlab-efficient-one-to-one-mapping-between-value-and-index-in-an-array/55728535#55728535). — Hoki, Apr 17 '19 at 13:56

Cris Luengo · Answer 1 · 2019-04-17T13:39:10.907

You are looking for containers.Map.

You could do:

th = [0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90];
angle2i = containers.Map(th,1:numel(th));
index = angle2i(55)

This is a general solution that only requires that th contain unique elements. They don't need to be sorted, and they don't need to be integer (though care must be taken when comparing floating-point values!). This solution should be much faster than find for large arrays, as this solution is O(log n) and the find solution is O(n). But for very small arrays, the overhead of using containers.Map will show.

If th is guaranteed sorted, then solutions to this other question might be useful as well.

Of course, if there is a simple mathematical relationship (as in the case of the example th, then the O(1) solution by @mattesyo of computing the index cannot be beaten.

@FinnurPind: Check [the benchmark](https://stackoverflow.com/a/55734681/7328782)! This method looks nice and sounds like it is great, but it actually sucks in terms of speed. Your `find` is significantly faster under certain conditions. You might want to reconsider your choice for which of these answers you accept. :) — Cris Luengo, Apr 17 '19 at 19:27

score 4 · Accepted Answer · answered Apr 17 '19 at 14:37

If your keys are integers, perhaps you could use a sparse array! Consider the following situation:

function t = q55725607
%% Define the mapping(s):
keys = (1:60).*5; % Keys must be integers!
values = 1:numel(keys); 
% Construct an array such that `map(key) == value`
map = sparse(ones(numel(keys),1), keys, values);

% Compare to containers.Map:
hashmap = containers.Map(keys, values);

%% Try this out:
queryKeys = randi(60,5000000,1).*5;
queryKeysCell = num2cell(queryKeys);

t = [timeit(@f1,1), timeit(@f2,1)];

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function queryVals = f1()
  queryVals = reshape(full(map(queryKeys)), size(queryKeys));
end

function queryVals = f2
  queryVals = hashmap.values(queryKeysCell);
end

end

I don't know if the comparison I made is fair, but if it is, the sparse approach is an order of magnitude faster on my system (0.1549 vs 1.5685).

BTW, in case it isn't clear, the reason for using a sparse array is because it only takes up space for the non-zero values (so even if you have indices like 1 and 10E5, you would only store 2 values, roughly speaking).

This is pretty clever! I'll make a comparison for my case and see what turns out to be faster. Thank you for your input! — Finnur Pind, Apr 17 '19 at 18:30

score 3 · Answer 3 · answered Apr 17 '19 at 11:01

3

If there is a numerical context between the index and the value like it is in your example, you can just use this as a function instead of a look-up table:

function idx=angle2i(angle)
idx=angle/5+1;
end

But I don't know if this solves your problem, because I don't know your specific problem.

answered Apr 17 '19 at 11:01

mattesyo

171
8

Thanks for the reply. Well, you're right, for this particular case I could solve this with some numerical relationship. But I also have some cases where there is no apparent relationship between the index and the value, so I would need some sort of table which relates the two... Still not sure how I would go about implementing that. – Finnur Pind Apr 17 '19 at 12:20
1

In this case i don't think there is a general solution in case of run time optimization to your problem while not knowing the specific array size and value structure. The 'find' function is exactly doing what you're trying to reach which actually does not mean it is the most efficient method for every application. By not knowing the specific problem the only chance i see is, you have to try your "very non-elegant ideas" and use the MATLAB [Profiler](https://de.mathworks.com/help/matlab/matlab_prog/profiling-for-improving-performance.html) to probably find the fastest one (if there is one). – mattesyo Apr 17 '19 at 13:05
2

mattesyo, feel free to leave a comment to my answer if you want to warn about overhead. But don't delete half of it and change it to say something I wouldn't write. There is a reject reason for edit suggestions "conflicts with author's intent". Fixing typos and fixing formatting are OK. Changing the contents of an answer is not OK. [Here](https://stackoverflow.com/users/11289616/mattesyo?tab=activity&sort=suggestions) you can see your edit suggestions and what decisions were taken on them. – Cris Luengo Apr 17 '19 at 14:00
Actually, [this comment by the OP](https://stackoverflow.com/questions/55725607/matlab-efficient-one-to-one-mapping-between-value-and-index-in-an-array/55728535#comment98133638_55725607) suggests that they are indeed looking for what @CrisLuengo wrote, a general solution. Classic case of an oversimplified example in the question IMO – Adriaan Apr 17 '19 at 14:12

score 3 · Answer 4 · answered Apr 17 '19 at 14:52

3

If you have all (several million) values of angle at hand you can use ismember :

[~, idx] = ismember(angle, th);

answered Apr 17 '19 at 14:52

rahnema1

15,264
3
15
27

score 3 · Answer 5 · edited Jun 20 '20 at 09:12

Benchmark

Starting off with Dev-iL's benchmark, I added OP's find method and rahnema1's ismember method, and look at how the methods scale with data size (number of keys). I also compare two different use cases: finding 5000 keys at once, and finding 5000 keys one at the time.

These are the results:

                One key at the time                      Many keys at once
       -------------------------------------   -------------------------------------
size    sparse     c.Map      find  ismember    sparse     c.Map      find  ismember  
----   -------   -------   -------   -------   -------   -------   -------   -------
  50    5.1681   54.3091    3.7766   28.8590    0.0956    1.2973    0.5578    0.0537
 500    5.0864   54.7872    6.9310   32.5554    0.0977    1.6847    3.6726    0.0499
5000    5.2052   56.4472   35.1449   60.6480    0.1140    2.0886   38.7444    0.0789

[Timings on MATLAB R2017a, on a 3 year old iMac. Your mileage will vary.]

Contrary to my expectations, containers.Map has a lot of overhead and is not really suitable for this purpose. Even with an array of 5000 keys, the O(n) find method is actually faster than the O(log n) hash map. containers.Map is a custom class, and the MATLAB JIT is still not as good with optimizing that type of code. However, one can clearly see the effect of the scaling there, as the find method is the only one whose run time increases significantly with increasing data sizes.

Interestingly, the "sparse" method is ~50x faster when vectorized. This is usually not the case any more with vectorization. For example the find method is only about 1x-2x faster when vectorized (for larger data sizes, the vectorization requires too much memory, and eventually will prove to be very slow).

The largest difference here between vectorized and loop code is for the ismember function. This one sorts the input data, and so here we see the difference between doing that once, and doing it 5000 times. This method is really only suitable when called few times. But in that case, ismember also is easily the fastest method.

When fetching one key at the time, the sparse method is fastest, unless the data size is very small, in which case the find method wins. However, the sparse method is the only one that requires keys to be positive integers (it will not work with 0, negative values, or non-integer values). The other methods all work with values of arbitrary types (including strings).

Benchmark code:

function t = so(N)
% Define the mapping(s):
keys = (1:N).*5; % Keys must be positive integers!
values = 1:N; 

% Sparse lookup table
sparseMap = sparse(ones(numel(keys),1), keys, values);

% containers.Map lookup table
hashMap = containers.Map(keys, values);

% Try this out:
queryKeys = keys(randi(numel(keys),5000,1));
queryKeysCell = num2cell(queryKeys); % trick to read many values from the hashMap at once

t = [timeit(@f1,1), timeit(@f2,1), timeit(@f3,1), timeit(@f4,1), ...
     timeit(@f1q,1), timeit(@f2q,1), timeit(@f3q,1), timeit(@f4q,1)] * 1000;

% Functions that do the lookup one at the time:

   function queryVals = f1
      queryVals = zeros(size(queryKeys));
      for ii=1:numel(queryKeys)
         queryVals(ii) = full(sparseMap(queryKeys(ii)));
      end
   end

   function queryVals = f2
      queryVals = zeros(size(queryKeys));
      for ii=1:numel(queryKeys)
         queryVals(ii) = hashMap(queryKeys(ii));
      end
   end

   function queryVals = f3
      queryVals = zeros(size(queryKeys));
      for ii=1:numel(queryKeys)
         queryVals(ii) = find(keys==queryKeys(ii));
      end
   end

   function queryVals = f4
      queryVals = zeros(size(queryKeys));
      for ii=1:numel(queryKeys)
         [~, queryVals(ii)] = ismember(queryKeys(ii), keys);
      end
   end

% Functions that do the lookup all at once:

   function queryVals = f1q
      queryVals = reshape(full(sparseMap(queryKeys)), size(queryKeys));
   end

   function queryVals = f2q
      queryVals = hashMap.values(queryKeysCell);
   end

   function queryVals = f3q
      [queryVals,~] = find(keys.'==queryKeys);
   end

   function queryVals = f4q
      [~,queryVals] = ismember(queryKeys, keys);
   end

end

@CrisLuengo I have two other solution in mind. Could you please test efficiency of [griddedInterpolant](https://www.mathworks.com/help/matlab/examples/grid-based-interpolation.html) and [scatteredInterpolant](https://www.mathworks.com/help/matlab/ref/scatteredinterpolant.html) ? — rahnema1, Apr 17 '19 at 19:52
@rahnema1: I don't expect those functions to be efficient, because they are designed for interpolation, not finding identical elements. A solution using those would need to test the input to make sure no interpolation would happen, otherwise they'd do something fundamentally different (return a value not in the list, as opposed to error out). But feel free to update this answer with whatever methods you come up with. I made it a community wiki for that reason! — Cris Luengo, Apr 17 '19 at 19:58

Efficient one-to-one mapping between value and index in an array

5 Answers5

Benchmark

Benchmark code: