16

Possible Duplicate:
Hash tables in MATLAB

General Question

Is there any way to get a hashset or hashmap structure in Matlab?

I often find myself in situations where I need to find unique entries or check membership in vectors and using commands like unique() or logical indexing seems to search through the vectors and is really slow for large sets of values. What is the best way to do this in Matlab?

Example

Say, for example, that I have a list of primes and want to check if 3 is prime:

primes = [2,3,5,7,11,13];

if primes(primes==3)
    disp('yes!')
else
    disp('no!')
end

if I do this with long vectors and many times things get really slow.

In other languages

So basically, is there any equivalents to python's set() and dict(), or similarly Java's java.util.HashSet and java.util.HashMap, in Matlab? And if not, is there any good way of doing lookups in large vectors?

Edit: reflection on the answers

This is the running time i got on the suggestions in the answers.

>> b = 1:1000000;
>> tic; for i=1:100000, any(b==i);; end; toc
Elapsed time is 125.925922 seconds.

s = java.util.HashSet();
>> for i=1:1000000, s.add(i); end    
>> tic; for i=1:100000, s.contains(i); end; toc
Elapsed time is 25.618276 seconds.

>> m = containers.Map(1:1000000,ones(1,1000000));
>> tic; for i=1:100000, m(i); end; toc
Elapsed time is 2.715635 seconds

The construction of the java set was quite slow as well though so depending on the problem this might be quite slow as well. Really glad about the containers.Map tip. It really destroys the other examples, and it was instant in set up too.

Community
  • 1
  • 1
while
  • 3,602
  • 4
  • 33
  • 42

2 Answers2

20

Like this?

>> m = java.util.HashMap;
>> m.put(1,'hello,world');
>> m.get(1)
ans =
hello, world

Alternatively, if you want a Matlab-native implementation, try

>> m = containers.Map;
>> m('one') = 1;
>> m('one')
ans =
     1

This is actually typed - the only keys it will accept are those of type char. You can specify the key and value type when you create the map:

>> m =  containers.Map('KeyType','int32','ValueType','double');
>> m(1) = 3.14;
>> m(1)
ans =
  3.14

You will now get errors if you try to put any key other than an int32 and any value other than a double.

You also have Sets available to you:

>> s = java.util.HashSet;
>> s.put(1);
>> s.contains(1)
ans = 
     1
>> s.contains(2)
ans = 
     0
Chris Taylor
  • 46,912
  • 15
  • 110
  • 154
  • Thanks! I had no Idea you could do this. That is awesome! – while Sep 27 '12 at 16:25
  • Is there any native Set equivalent too? – while Sep 27 '12 at 16:27
  • I edited to show you how to use Sets. Note that I make no guarantees as to how fast these are. They are Matlab wrappers around the Java implementation, so there's obviously going to be some overhead as compared to the native Java versions. – Chris Taylor Sep 27 '12 at 16:29
  • Thanks a bunch!, It makes sense with the overhead. The containers.Map object seems to be the fastest. Just for reference it takes other types of keys as well. Found it in the help. – while Sep 27 '12 at 16:44
  • Hmm! Maybe that's changed in the version of Matlab I have (or since...) Glad it helped. – Chris Taylor Sep 27 '12 at 16:57
  • How do you iterate through a hashset in matlab? – andyjslee Jul 21 '16 at 21:32
  • @andy7586 Same way you would iterator through a HashSet in Java. Create an iterator by calling the `iterator()` method of the HashSet instance, then call the `next()` method of the iterator for each iteration. – beeftendon Dec 20 '16 at 00:16
  • Another way to iterate is `cell(s.toArray)` – Eric May 14 '17 at 01:05
1

Depending on how literal your example is, the disp will be a massive overhead (I/O is very slow).

That aside, I believe the quickest way to do a check like this is:

if find(primes==3,1,'first')
    disp('yes');
else
    disp('no');
end

Edit, you could also use any(primes==3) - a quick speed test shows they're approximately equivalent:

>> biglist = 1:100000;
>> tic;for i=1:10000
find(biglist==i,1,'first');
end
toc
Elapsed time is 1.055928 seconds.

>> tic;for i=1:10000
any(biglist==i);
end
toc
Elapsed time is 1.054392 seconds.
n00dle
  • 5,949
  • 2
  • 35
  • 48
  • Ok, thanks. It makes sense that `any()` would be faster but isn't there any way where it doesn't have to check all values? I mean just the logical part must check all values in the vector and return `0` or `1` for all values. Or is it really smart if used with `any`? Btw, I just put `disp` there for the sake of putting something inside the statement. Thanks though. – while Sep 27 '12 at 16:22
  • Ok, but I was interested in O(1) lookup which should be a lot faster then that. – while Sep 27 '12 at 16:24
  • A quick Google search turned up: http://www.mathworks.com/matlabcentral/fileexchange/15831-hashtable-class – n00dle Sep 27 '12 at 16:25