30

I'm working with a large ArrayList<HashMap<A,B>>, and I would repeatedly need to select a random key from a random HashMap (and do some stuff with it). Selecting the random HashMap is trivial, but how should I select a random key from within this HashMap?

Speed is important (as I need to do this 10000 times and the hashmaps are large), so just selecting a random number k in [0,9999] and then doing .next() on the iterator k times, is really not an option. Similarly, converting the HashMap to an array or ArrayList on every random pick is really not an option. Please, read this before replying.

Technically I feel that this should be possible, since the HashMap stores its keys in an Entry[] internally, and selecting at random from an array is easy, but I can't figure out how to access this Entry[]. So any ideas to access the internal Entry[] are more than welcome. Other solutions (as long as they don't consume linear time in the hashmap size) are also welcome of course.

Note: heuristics are fine, so if there's a method that excludes 1% of the elements (e.g. because of multi-filled buckets) that's no problem at all.

user1111929
  • 6,050
  • 9
  • 43
  • 73
  • Entries are chained when you have more than one at the same index. So that wouldn't be so simple. – Denys Séguret Sep 12 '12 at 09:40
  • If converting the entrySet to a List isn'f fast enough (did you profile ?), then you need another data structure. – Denys Séguret Sep 12 '12 at 09:41
  • @dystroy Pseudorandom is fine, if there are 1% of the entries that are never picked, this is no big deal. Does that give extra options? So, no worries if an element is chained, then just pick another element. – user1111929 Sep 12 '12 at 09:42

10 Answers10

28

from top of my head

List<A> keysAsArray = new ArrayList<A>(map.keySet())
Random r = new Random()

then just

map.get(keysAsArray.get(r.nextInt(keysAsArray.size()))
user902383
  • 8,420
  • 8
  • 43
  • 63
  • 1
    That has still linear complexity in the size of the keySet, no? :/ – user1111929 Sep 12 '12 at 09:46
  • 3
    @user1111929 that depends on whether your keyset changes often. You can get away with just updating the list only when something is added to the map, or removed. Then the fetch itself will be constant time. – Joeri Hendrickx Sep 12 '12 at 09:48
  • alas, every random pick modifies one of the hashmaps. If I had access to `Entry[]` I could of course just do a simple modification in the array, but it seems this `Entry[]` is not accessible (unless I copy-paste the entire source code). – user1111929 Sep 12 '12 at 10:07
  • It is better in the sense that it doesn't need reflection and it will work efficiently even when there is very few entries left. – Peter Lawrey Sep 13 '12 at 06:27
19

I managed to find a solution without performance loss. I will post it here since it may help other people -- and potentially answer several open questions on this topic (I'll search for these later).

What you need is a second custom Set-like data structure to store the keys -- not a list as some suggested here. Lists-like data structures are to expensive to remove items from. The operations needed are adding/removing elements in constant time (to keep it up-to-date with the HashMap) and a procedure to select the random element. The following class MySet does exactly this

class MySet<A> {
     ArrayList<A> contents = new ArrayList();
     HashMap<A,Integer> indices = new HashMap<A,Integer>();
     Random R = new Random();

     //selects random element in constant time
     A randomKey() {
         return contents.get(R.nextInt(contents.size()));
     }

     //adds new element in constant time
     void add(A a) {
         indices.put(a,contents.size());
         contents.add(a);
     }

     //removes element in constant time
     void remove(A a) {
        int index = indices.get(a);
        contents.set(index,contents.get(contents.size()-1));
        indices.put(contents.get(index),index);
        contents.remove((int)(contents.size()-1));
        indices.remove(a);
     }
}
user1111929
  • 6,050
  • 9
  • 43
  • 73
  • The add operation is O(n) because you are using ArrayList. – Jakub Zaverka Sep 12 '12 at 17:20
  • Why would the add operation be O(n)? I'm appending a to the end of the ArrayList, that's O(1). – user1111929 Sep 12 '12 at 23:09
  • Yes, but the ArrayList is not unlimited in size, is it? So what happens if you insert another element in already full ArrayList? It declares a new, bigger array, and copies the array to the bigger one, requiring O(n). But this happens rearely, so technically speaking, you can rely on O(1). Strictly speaking, it's O(n). – Jakub Zaverka Sep 13 '12 at 07:36
  • And the remove operation is also O(n) (this time even technically). ArrayList says: "The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking)." – Jakub Zaverka Sep 13 '12 at 07:38
  • And I don't get why you are using another map to store integer indices. Why don't you just use the `indexOf` method? – Jakub Zaverka Sep 13 '12 at 07:43
  • Because indexOf runs in linear time, not constant time. Removing index k works in O(size-k) time, so when you only remove the last element every time, it works in O(1) time. This is how the ArrayList works. – user1111929 Sep 13 '12 at 08:52
  • 1
    Nice solution but a little buggy: in the remove method, you are removing the last item from contents and then update the indices mapping with the (new last element, index in which you put the previous last element). You should store the last element removed from contents and use it as indices key. – Alberto Di Gioacchino Mar 11 '20 at 10:13
7

You need access to the underlying entry table.

// defined staticly
Field table = HashMap.class.getDeclaredField("table");
table.setAccessible(true);
Random rand = new Random();

public Entry randomEntry(HashMap map) {
    Entry[] entries = (Entry[]) table.get(map);
    int start = rand.nextInt(entries.length);
    for(int i=0;i<entries.length;i++) {
       int idx = (start + i) % entries.length;
       Entry entry = entries[idx];
       if (entry != null) return entry;
    }
    return null;
}

This still has to traverse the entries to find one which is there so the worst case is O(n) but the typical behaviour is O(1).

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • `HashMap.class.getDeclaredField("table");`, brilliant, thanks! All I'm left to wonder with now is why they didn't put this in HashMap and HashSet by default. :-) – user1111929 Sep 12 '12 at 11:01
  • @user1111929 Using generics for this kind of purpose is fishy - if the implementation ever changes, the program gets broken. One should program against interface, not implementation. – Jakub Zaverka Sep 12 '12 at 17:14
  • 1
    @JakubZaverka I think you mean using reflections is kind of fishy and is brittle. I don't see a problem with using generics. ;) – Peter Lawrey Sep 12 '12 at 17:25
  • How about this solution http://stackoverflow.com/questions/12385284/how-to-select-a-random-key-from-a-hashmap-in-java/12386664#12386664 do you find it better? – user1111929 Sep 12 '12 at 23:18
4

Sounds like you should consider either an ancillary List of keys or a real object, not a Map, to store in your list.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Unfortunately a HashMap does not offer the ability to store the keys in a simple list, and no other structure Maps arbitrary objects to arbitrary objects with a constant-time get(). – user1111929 Sep 12 '12 at 09:47
  • Hence the word "ancillary". It's separate data structure that you'll maintain along with the list of Maps. You're guilty of thinking at too low a level. – duffymo Sep 12 '12 at 09:49
  • That's true, but given the size of my HashMaps, any auxiliary structure will increase memory usage significantly (as the objects are small but there are a lot of them). I am still hoping to somehow get access to the `Entry[]`. I can copy-paste the entire source in a new file and use it there, but that's not really good coding style. :/ – user1111929 Sep 12 '12 at 10:01
2

As @Alberto Di Gioacchino pointed out, there is a bug in the accepted solution with the removal operation. This is how I fixed it.

class MySet<A> {
     ArrayList<A> contents = new ArrayList();
     HashMap<A,Integer> indices = new HashMap<A,Integer>();
     Random R = new Random();

     //selects random element in constant time
     A randomKey() {
         return contents.get(R.nextInt(contents.size()));
     }

     //adds new element in constant time
     void add(A item) {
         indices.put(item,contents.size());
         contents.add(item);
     }

     //removes element in constant time
     void remove(A item) {
        int index = indices.get(item);
        contents.set(index,contents.get(contents.size()-1));
        indices.put(contents.get(index),index);
        contents.remove(contents.size()-1);
        indices.remove(item);
     }
}
DWD 3
  • 21
  • 4
  • 1
    Ah, I would've sworn I fixed that before, seems I didn't. But correct code indeed! Fixed it now in my own as well. (Btw, you have `item` instead of `a`, this won't compile.) – user1111929 May 02 '21 at 23:27
  • ah yes! thanks for pointing it out, just edited it. Also thanks for your implementation, it worked out perfectly for me. – DWD 3 May 03 '21 at 00:40
1

I'm assuming you are using HashMap as you need to look something up at a later date?

If not the case, then just change your HashMap to an Array/ArrayList.

If this is the case, why not store your objects in a Map AND an ArrayList so you can look up randomly or by key.

Alternatively, could you use a TreeMap instead of HashMap? I don't know what type your key is but you use TreeMap.floorKey() in conjunction with some key randomizer.

TedTrippin
  • 3,525
  • 5
  • 28
  • 46
1

After spending some time, I came to the conclusion that you need to create a model which can be backed by a List<Map<A, B>> and a List<A> to maintain your keys. You need to keep the access of your List<Map<A, B>> and List<A>, just provide the operations/methods to the caller. In this way, you will have the full control over implementation, and the actual objects will be safer from external changes.

Btw, your questions lead me to,

This example, IndexedSet, may give you an idea about how-to.

[edited]

This class, SetUniqueList, might help you if you decide to create your own model. It explicitly states that it wraps the list, not copies. So, I think, we can do something like,

List<A> list = new ArrayList(map.keySet());
SetUniqueList unikList = new SetUniqueList(list, map.keySet);
// Now unikList should reflect all the changes to the map keys
...
// Then you can do
unikList.get(i);

Note: I didn't try this myself. Will do that later (rushing to home).

Community
  • 1
  • 1
Adeel Ansari
  • 39,541
  • 12
  • 93
  • 133
1

Since Java 8, there is an O(log(N)) approach with O(log(N)) additional memory: create a Spliterator via map.entrySet().spliterator(), make log(map.size()) trySplit() calls and choose either the first or the second half randomly. When there are say less than 10 elements left in a Spliterator, dump them into a list and make a random pick.

leventov
  • 14,760
  • 11
  • 69
  • 98
  • Will this yield a (more or less) uniformely random element of the hashmap? Does `trySplit()` split them always in half or what? I'm confused regarding the inner workings of this new routine. – user1111929 Feb 28 '19 at 13:42
  • The larger the cutoff size when you dump the remaining elements to a list and do a random choice, the more uniform will be the choice of random key overall. 10 elements is the starting point for tests (of uniformity). I think that the actual value may be between 8 and 32 elements when the choice becomes very uniform. – leventov Mar 01 '19 at 12:34
  • 1
    It probably depends on the quality of the distribution of hash codes of the keys, however. If the quality is decent this shouldn't be a concern, but if there are a lot of exact hash code collisions or all keys are concentrated in some parts of the table because some bits are 0s or 1s in the majority or all hash codes, the uniformity of the random key choice might be affected. – leventov Mar 01 '19 at 12:48
0

If you absolutely need to access the Entry array in HashMap, you can use reflection. But then your program will be dependent on that concrete implementation of HashMap.

As proposed, you can keep a separate list of keys for each map. You would not keep deep copies of the keys, so the actual memory denormalisation wouldn't be that big.

Third approach is to implement your own Map implementation, the one that keeps keys in a list instead of a set.

Jakub Zaverka
  • 8,816
  • 3
  • 32
  • 48
0

How about wrapping HashMap in another implementation of Map? The other map maintains a List, and on put() it does:

if (inner.put(key, value) == null) listOfKeys.add(key);

(I assume that nulls for values aren't permitted, if they are use containsKey, but that's slower)