design an algorithm to randomized choose one element from a linkedlist

Question

Possible Duplicate:
How would you pick a uniform random element in linked list with unknown length?

Suppose we want to randomized choose an element from linked list, but we don't know the length of the linked list.

Design an algorithm which takes as least as possible running time to randomized choose the element.

Is this something you do repeatedly so that it can learn from previous runs? — user650654, Oct 03 '12 at 23:39
Store the address of each node in some array or random accessible container and use it. — Jagannath, Oct 03 '12 at 23:42
See [Given an unknown length list, return a random item in it ...](http://stackoverflow.com/questions/8695022) or [How would you pick a uniform random element in linked list with unknown length?](http://stackoverflow.com/questions/9401375) or [Get a random element in single direction linked list...](http://stackoverflow.com/questions/5829122) or [Efficiently selecting a set of random elements from a linked list](http://stackoverflow.com/questions/54059), or [Fastest way to pick a random element from a list that...](http://stackoverflow.com/questions/1758315) — James Waldby - jwpat7, Oct 04 '12 at 00:56

Karoly Horvath · Accepted Answer · 2012-10-03T23:50:43.663

10

There's a simple algorithm with O(N), get the length, then pick an element.

But you can do better, with an online algorithm, accessing every element only once:

You select the first element as the answer, then on the kth element you replace your answer with that element with a probability of 1/k. The fact that this method is not biased can be proved with mathematical induction.

The a generalized version (pick k elements), is the reservoir sampling.

edited Oct 03 '12 at 23:50

answered Oct 03 '12 at 23:41

Karoly Horvath

94,607
11
117
176

What do you mean "then on the kth element replace the pick with a probability of 1/k"? Could you explain it with more details? – babysnow Oct 03 '12 at 23:50
you select the first element, then you drop that and choose the second element with 50% probability, then the next with 1/3 and so on... – Karoly Horvath Oct 03 '12 at 23:51
1

From an implementation point of view, can the second algorithm actually end up slower because of having to generate N random numbers? I.e. is generating a random number faster than dereferencing a pointer? – Ivan Vergiliev Oct 04 '12 at 00:19
@IvanVergiliev, for a linked list, it's almost certainly faster to scan twice, since a random number generator of any quality is going to be slower than a pointer dereference. But consider the very similar problem, "Choose a random line in a file." Now the reservoir algorithm is starting to look a lot more attractive. – rici Oct 04 '12 at 02:43
1

@Ivan If your random number generator is e.g. a linear congruential generator (which may be good enough for this experiment, otherwise combine two of them, see L'Ecuyer), I'm pretty sure for large lists, the dereferencing of pointers will be much slower due to cache misses and page faults. – user515430 Oct 04 '12 at 05:10

score -2 · Answer 2 · answered Oct 03 '12 at 23:46

Consider this: I'm assuming you have a variable that keeps track of how many items are in you're linked list (Standard Programming Practice). If you have a function that passes in that variable, all you have to do is have a pointer to an item in the linked list, and set the pointer to rand() % Amount_Of_Items and then have the function return the pointer if you must use it.

design an algorithm to randomized choose one element from a linkedlist

2 Answers2