Lets assume file is so large that you cannot afford to fit it into RAM. Then, you would want to use Reservoir Sampling, an algorithm designed to handle picking randomly from lists of unknown, arbitrary length that might not fit into memory:
Random r = new Random();
int currentLine = 1;
string pick = null;
foreach (string line in File.ReadLines(filename))
{
if (r.Next(currentLine) == 0) {
pick = line;
}
++currentLine;
}
return pick;
At a high level, reservoir sampling follows a basic rule: Each further line has a 1/N chance of replacing all previous lines.
This algorithm is slightly unintuitive. At a high level, it works by having line N have a 1/N chance of replacing the currently selected line. Thus, line 1 has a 100% chance of being selected, but a 50% chance of later being replaced by line 2.
I've found understanding this algorithm to be easiest in the form of a proof of correctness. So, a simple proof by induction:
1) Base case: By inspection, the algorithm works if there is 1 line.
2) If the algorithm works for N-1 lines, processing N lines works because:
3) After processing N-1 iterations of an N line file, all N-1 lines are equally likely (probability 1/(N-1)).
4) The next iteration insures that line N has a probability of 1/N (because that's what the algorithm explicitly assigns it, and it is the final iteration), reducing the probability of all previous lines to:
1/(N-1) * (1-(1/N))
1/(N-1) * (N/N-(1/N))
1/(N-1) * (N-1)/N
(1*(N-1)) / (N*(N-1))
1/N
If you know how many lines are in the file in advance, this algorithm is more expensive than necessary, as it always reads the entire file.