7

I have a text file with few hundred lines, the structure is pretty simple.

firstname lastname

I need to pick out a random firstname & listname from the file.

Jeff LaFay
  • 12,882
  • 13
  • 71
  • 101
crap
  • 73
  • 1
  • 3

2 Answers2

15
string[] lines = File.ReadAllLines(...); //i hope that the file is not too big
Random rand = new Random();
return lines[rand.Next(lines.Length)];

Another (and maybe better) option is to have the first line of the file contain the number of records in it and then you don't have to read all the file.

Itay Karo
  • 17,924
  • 4
  • 40
  • 58
  • 4
    +1: I hate it when people answer the questions I can actually answer before I can, especially when they use exactly the same code I would :) – Callum Rogers Sep 19 '10 at 13:58
  • Thanks, i just figured it out. But your way seems better. – crap Sep 19 '10 at 14:01
  • That works as long as the file is relatively small. I've provided an alternative that allow you to not have to keep the entire file in memory. – tvanfosson Sep 19 '10 at 14:02
  • @tvanfosson: The second version will work also if file is very large (by adding # of records in first line) and the advantage of it over you solution (which i really like from math perspective) is that on average only half of the file will be read (and only one line in memory each time). – Itay Karo Sep 19 '10 at 14:08
  • +1 for the "first line of the file contain the number of records" part. – Josh M. May 30 '11 at 07:14
  • Don't allocate a `new Random()` every call, the result will not be random if the calls occur too close together in time. A thread-static `Random` should be used instead. See [Random number generator only generating one random number](https://stackoverflow.com/q/767999). – dbc Dec 31 '18 at 03:08
13

Read each line keeping a count, N, of the lines you have seen so far. Select each line with probability 1/N, i.e., the first line will always be chosen, the second line will be chosen 1/2 times to replace the first, the third 1/3 times, ... This way each line has a 1/N probability of being the selected line, you only have to read the file once, and you don't need to store all of the file in memory at any given time.

Here's an implementation that can be adapted for your needs.

public string RandomLine( StreamReader reader )
{
    string chosen = null;
    int numberSeen = 0;
    var rng = new Random();
    while ((string line = reader.ReadLine()) != null)
    {
        if (rng.NextInt(++numberSeen) == 0)
        {
            chosen = line;
        }
    }
    return chosen;
}

Based on a C implementation for selecting a node from an arbitrarily long linked list.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • the math looks wrong to me. Isn't the probability of selecting the 1st line 1/n! by the time you get to the end? – President James K. Polk Sep 19 '10 at 14:09
  • The math here is fine and nice :) – Itay Karo Sep 19 '10 at 14:11
  • @Itay: only if you believe the comments: I don't. But, I've been fooled by probability before. – President James K. Polk Sep 19 '10 at 14:14
  • 1
    @Itay: Ahh, I think I am getting a clue now. In step n you *keep* the previous selection with probability (n-1)/n. And by induction, the probability of the previous selection was 1/(n-1), so multiplying through gives 1/n. Clever. – President James K. Polk Sep 19 '10 at 14:20
  • 3
    @GregS - use inductive reasoning. Assume that at the Nth step, each element has 1/n probability of being chosen. For N = 1, this is clear. For N = 2, the first was chosen at step 1 and has a 1/2 probability of being replaced at step 2 so our assumption holds. Now at step, n+1 each of the preceding elements, by our hypothesis, had 1/n chance of being selected randomly. With probability 1/n+1 we choose to replace it. Clearly the current element has a 1/n+1 probability of being selected, but that element was selected randomly so all preceding elements have a 1/n probability of being selected – tvanfosson Sep 19 '10 at 14:24
  • actually the probability of line n to be chosen is p(n) = (1/n)*(1 - sum where i > n of p(i)) – Itay Karo Sep 19 '10 at 14:25
  • Don't allocate a `new Random()` every call, the result will not be random if the calls occur too close together in time. A thread-static `Random` should be used instead. See [Random number generator only generating one random number](https://stackoverflow.com/q/767999). – dbc Dec 31 '18 at 03:16