8

I have a text file which has more than 3000 lines. I am finding the number of lines using

string[] lines = File.ReadAllLines(myPath);
var lineCount = lines.Length; 

Then I am generating a random number

Random rand = new Random();
var lineToRead = rand.Next(1, lineCount);

Now I need to read the specific line that is generated by random number. I can do this using

string requiredLine = lines[lineToRead];

Because my file is big I don't think creating such a big array is efficient. Is there a more efficient or easier way to do this?

skeletank
  • 2,880
  • 5
  • 43
  • 75
asdfkjasdfjk
  • 3,784
  • 18
  • 64
  • 104
  • 1
    It is the usual problem between speed and memory usage. Your way is a problem for memory usage, going to read line by line will be a problem in speed. Of course, nowadays I will prefer to read 3000 lines in memory – Steve Apr 03 '13 at 11:56
  • You should at least scan you file for the endline character. So you can use ReadLine for rand count, but not to get the proper line at once. – Alex Apr 03 '13 at 12:00
  • @Steve: The implementation of File.ReadAllLines() just uses repeated calls to StreamReader.ReadLine(), so it won't be any faster than doing it yourself explicitly. – Matthew Watson Apr 03 '13 at 12:10
  • I don't understand why two answers were deleted. They seemed to be working but I don't see those answers anymore. – asdfkjasdfjk Apr 03 '13 at 12:14
  • @MatthewWatson you are right. I should have known better. So in this case it is better to go line by line just to the line required. Of course this is not the case if the OP repeats the operation with a different index – Steve Apr 03 '13 at 12:23
  • Have you looked at [this answer](http://stackoverflow.com/a/3745973/142637)? – sloth Apr 03 '13 at 12:25

6 Answers6

10

Here is a solution which iterates the file twice (first time to count lines, next time to select line). The benefit is that you don't need to create an array of 3000 strings in memory. But, as mentioned above, it will possibly be slower. Why possibly? - because File.ReadAllLines creates a list of strings inside and that list will be resized many times while filling it with 3000 items. (Initial capacity will be 4. When the inner array is completely filled, then the new array of doubled size will be created and all strings will be copied there).

So, the solution uses File.ReadLines method which returns IEnumerable<string> with lines and skip lines you don't need:

IEnumerable<string> lines = File.ReadLines(myPath);
var lineToRead = rand.Next(1, lines.Count());
var line = lines.Skip(lineToRead - 1).First();

BTW, internally File.ReadLines uses SteamReader which reads file line by line.

skeletank
  • 2,880
  • 5
  • 43
  • 75
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
1

What you can do is parse the file to find the index of each line and then at a later time you can go back to a certain line by using Stream.Position to get the content. Using this method you don't need to keep anything in memory and it is reasonably fast. I tested this on a file that is 20K lines and 1MB in size. It took 7ms to index the file and 0.3to get the line.

    // Parse the file
    var indexes = new List<long>();
    using (var fs = File.OpenRead("text.txt"))
    {
        indexes.Add(fs.Position);
        int chr;
        while ((chr = fs.ReadByte()) != -1)
        {
            if (chr == '\n')
            {                        
                indexes.Add(fs.Position);
            }
        }
    }

    int lineCount = indexes.Count;
    int randLineNum = new Random().Next(0, lineCount - 1);
    string lineContent = "";


    // Read the random line
    using (var fs = File.OpenRead("text.txt"))
    {
        fs.Position = indexes[randLineNum];
        using (var sr = new StreamReader(fs))
        {
            lineContent = sr.ReadLine();
        }
    }
Du D.
  • 5,062
  • 2
  • 29
  • 34
0

You can wrap your stream into StreamReader and call ReadLine as many times as needed to go to your target line. That way you don't need to hold the whole file contents in memory.

However, this is only feasible if you do that rarely and file is quite big.

alex
  • 12,464
  • 3
  • 46
  • 67
  • So how do you know how many times you have to call `ReadLine`? – sloth Apr 03 '13 at 12:08
  • To find the total line count, you can read all the characters one by one from the file using the same StreamReader and count number of new line chars. – alex Apr 03 '13 at 12:17
0

Use Reservoir Sampling to solve this in in a single pass

If you want to randomly choose one or more items from a list of items where the length of that list is not known in advance, you can use Reservoir Sampling.

We can take advantage of that, along with the File.ReadLines() method (which avoids buffering all the lines in memory) to write a single-pass algorithm that will read each line just once, without buffering.

The sample code below shows a generalised solution that lets you randomly select any number of lines. For your case, N = 1.

The sample code also includes a test program to prove that the lines are chosen randomly with a uniform distribution.

(To see how this code works, see the Wiki article I linked above.)

using System;
using System.IO;
using System.Collections.Generic;

namespace Demo
{
    internal class Program
    {
        public static List<string> RandomlyChooseLinesFromFile(string filename, int n, Random rng)
        {
            var result = new List<string>(n);
            int index = 0;

            foreach (var line in File.ReadLines(filename))
            {
                if (index < n)
                {
                    result.Add(line);
                }
                else
                {
                    int r = rng.Next(0, index + 1);

                    if (r < n)
                        result[r] = line;
                }

                ++index;
            }

            return result;
        }

        // Test RandomlyChooseLinesFromFile()

        private static void Main(string[] args)
        {
            Directory.CreateDirectory("C:\\TEST");
            string testfile = "C:\\TEST\\TESTFILE.TXT";
            File.WriteAllText(testfile, "0\n1\n2\n3\n4\n5\n6\n7\n8\n9");
            var rng = new Random();
            int trials = 100000;
            var counts = new int[10];

            for (int i = 0; i < trials; ++i)
            {
                string line = RandomlyChooseLinesFromFile(testfile, 1, rng)[0];
                int index = int.Parse(line);
                ++counts[index];
            }

            // If this algorithm is correct, each line should be chosen
            // approximately 10% of the times.

            Console.WriteLine("% times each line was chosen:\n");

            for (int i = 0; i < 10; ++i)
            {
                Console.WriteLine("{0} = {1}%", i, 100*counts[i]/(double)trials);
            }
        }
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
-1

Below will help you in reading at specific line in a file..

http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/4dbd68f6-61f5-4d36-bfa0-5c909101874b

A code snipet

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ReadLine
{
class Program
{
    static void Main(string[] args)
    {
        //Load our text file
        TextReader tr = new StreamReader("\\test.txt");

        //How many lines should be loaded?
        int NumberOfLines = 15;

        //Make our array for each line
        string[] ListLines = new string[NumberOfLines];

        //Read the number of lines and put them in the array
        for (int i = 1; i < NumberOfLines; i++)
        {
            ListLines[i] = tr.ReadLine();
        }

        //This will write the 5th line into the console
        Console.WriteLine(ListLines[5]);
        //This will write the 1st line into the console
        Console.WriteLine(ListLines[1]);

        Console.ReadLine();

        // close the stream
        tr.Close();
    }
}
}

These can also be helpful..

http://www.tek-tips.com/viewthread.cfm?qid=1460456

How do I read a specified line in a text file?

And below is for editing

Edit a specific Line of a Text File in C#

Hope it helps...

Community
  • 1
  • 1
Hiren Pandya
  • 989
  • 1
  • 7
  • 20
  • Actually this is what he's doing, more or less, he doesn't want to use huge arrays – Robert W. Hunter Apr 03 '13 at 12:01
  • Yea.. I looked into the question more deeply but could not find any better solution though.. But I think, without array, there are other solutions available which increases the complexity. Correct me if I am wrong somewhere... – Hiren Pandya Apr 03 '13 at 12:03
  • So how does this help getting a random line from the file? Instead of reading all lines, the code you posted reads only _x_ lines... – sloth Apr 03 '13 at 12:06
  • Posted code is a snipet.. So that he can get a better idea.. He can add the Random Generation code him self.. I dont think pointers are supported in C#, otherwise I've posted my own code with the pointers if it was in C or C++.. – Hiren Pandya Apr 03 '13 at 12:09
-1

you can try like below... it can not create any big array but get a particular line...

string path = "D:\\Software.txt";
int lines = File.ReadAllLines(path).Length;
Random rand = new Random();
var lineToRead = rand.Next(1, lines);
var requiredLine = System.IO.File.ReadLines(path).Skip(lineToRead - 1).First();
Console.WriteLine(requiredLine.ToString());
Pandian
  • 8,848
  • 2
  • 23
  • 33
  • This actually loads the entire file to memory in a big array. That's what `File.ReadAllLines` does. – sloth Apr 03 '13 at 12:27
  • @DominicKexel: I think i just find a length by using File.ReadAllLines.. and i didn't store that lines in anywhere.. So Still the memory occupies...? – Pandian Apr 03 '13 at 12:30
  • Yes, the method still loads the entire file, regardless of storing the result in a variable or not. You're accessing the `Length` property of the array you created. To do so, the array has to be there, doesn't it? You could use `File.ReadLines(myPath).Count()` instead, which won't store the entire file content in an array. – sloth Apr 03 '13 at 12:43