I have a huge linked list of integers (let's say it has size N, but N is unknown to me) and want to get k random values from it in minimum possible time/space.
I think it must be possible to write a variation of inside-out Fisher-Yates shuffle that will solve this problem in O(N) time and O(k) additional space.
Can anyone help me to get statistically correct solution with specified time/space bounds?
I think my current code is close to the correct solution:
public class Node
{
public int Data;
public Node Next;
// O(N) time, O(k) additional space
public int[] GetRandomData(int k)
{
var a = new int[k];
var rand = new Random();
int n = 0;
Node cur = this;
while (cur != null)
{
int r = rand.Next(0, n);
if (r < k)
{
a[r] = cur.Data;
}
cur = cur.Next;
}
if (n < k) throw new ArgumentException("k is bigger than N");
return a;
}
}