1

I'm writing a program, currently using vectors, that needs to store data (atoms that move with particular probabilities) in such a way that it can be efficiently picked with a biased random probability.

At the moment vectors seem great for this as I can just assign multiple sites to one probability then pick a random site (so if atom a has a relative probability of 0.6 of being picked, and atom b has 0.4, i can assign 3 elements to a and 2 to b then randomly pick any element)

However then i need to remove all sites relating to the picked atom, which in a vector means finding them (or storing their locations separately), swapping them to the end and using push_back. Depending on how I do this it either seems to be very slow or very memory intensive.

It seems like hashtables might be a better structure to use, as these would allow me to look up sites relating to a particular atom more easily, but could i still use the same weighted picking mechanism with a hashtable?

I've never used hashtabes before, are there any other complications? Or is there another structure that I should look into that would be even better?

zephyr
  • 215
  • 2
  • 13

1 Answers1

1

Have you considered using a stl::list? Then you could use list::remove. E.g. to locate the item at randomIndex:

auto it = myList.begin();
advance(it4, randomIndex-1);
cout << "Randomly picked atom is " << *it;

Now remove all elements from the list that match that atom (no pun intended):

myList.remove(*it);

stl::list is a linked list so it generally performs better than a vector when removing elements.

eoinmullan
  • 1,157
  • 1
  • 9
  • 32
  • Am I right in thinking this requires iterating through the whole list until we find the right atom, unfortunately that can't work as i may be dealing with billions of atoms – zephyr Oct 02 '13 at 14:09
  • Yeah, you would need to iterate through once to get to your atom, then again for the remove. – eoinmullan Oct 02 '13 at 18:08
  • I'm not sure about a hashtable though, as I think there will be lots of hash collisions where one atom is at many sites. Maybe have a read at [this answer](http://stackoverflow.com/a/1761646/1976720). You'll still need to search through the list but you can use a binary search and you'll only need to remove one element each time. – eoinmullan Oct 02 '13 at 18:37
  • That's one of the previous methods for solving this problem i'm working on and i'm afraid it's incredibly slow for large numbers of rates. Even with the binary search optimisation, it requires adjusting on average half of the cumulative totals each step (slightly less if you order your data more cleverly) and is still way too sluggish – zephyr Oct 02 '13 at 23:15