3

I have the method below which uses Yield Return to read large ( >1m ) lines of text from a file.

    private static IEnumerable<string> ReadLineFromFile(TextReader fileReader)
    {
        using (fileReader)
        {
            string currentLine;
            while ((currentLine = fileReader.ReadLine()) != null)
            {
                yield return currentLine;
            }
        }
    }

I need to be able to write every 10 lines returned from this method to different files.

How do I consume this method without enumerating all the lines?

Any answer is very much appreciated.

MaYaN
  • 6,683
  • 12
  • 57
  • 109
  • @Kevin Does what not work?! :-) – MaYaN Oct 23 '12 at 00:52
  • this code.. I don't have a compiler handy, but it looks like it would work fine to iterate 10 lines at a time? – Rym Oct 23 '12 at 00:55
  • ~Kevin The code works, my question is how will the caller consume this method? how will caller collect the first 10 lines, as soon as I materialize the IEnumerable to List all the file is read until the end. I need to access the rows in the batch of lets say 10 lines. Hope it makes sense. – MaYaN Oct 23 '12 at 00:58
  • Have a look at the accepted answer here on how to chunk an `IEnumerable`: http://stackoverflow.com/questions/12186376/chunk-ienumerable-icollection-class-c-sharp-2-0 – Mike Zboray Oct 23 '12 at 01:13
  • The linked answer is pretty great, it means you just do foreach(var batch in Chunk(ReadLineFromFile(filename), BATCH_SIZE) { // process List } – Rym Oct 23 '12 at 01:27
  • @mikez - actually I looked at that method again and yes it is a very elegant way to solve it. Thanks again. – MaYaN Oct 23 '12 at 01:31

3 Answers3

2

I think I finally got it working :-)

        var listOfBufferedLines = ReadLineFromFile(ReadFilePath);

        var listOfLinesInBatch = new List<string>();
        foreach (var line in listOfBufferedLines)
        {
            listOfLinesInBatch.Add(line);

            if (listOfLinesInBatch.Count % 1000 == 0)
            {
                Console.WriteLine("Writing Batch.");
                WriteLinesToFile(listOfLinesInBatch, LoadFilePath);
                listOfLinesInBatch.Clear();
            }
        }

        // writing the remaining lines
        WriteLinesToFile(listOfLinesInBatch, LoadFilePath);
MaYaN
  • 6,683
  • 12
  • 57
  • 109
0

If you run the below code, you can see all you need to do is call your method within a foreach loop, and it will iterate it one at a time, you just need to buffer it somewhere to a batch size of your choice.

static void Main (string [] args)
{
    int batch_size = 5;
    string buffer = "";
    foreach (var c in EnumerateString("THISISALONGSTRING")) 
    {               
        // Check if it's time to split the batch
        if (buffer.Length >= batch_size) 
        {
            // Process the batch
            buffer = ProcessBuffer(buffer);
        }

        // Add to the buffer
        buffer += c;
    }

    // Process the remaining items
    ProcessBuffer(buffer);

    Console.ReadLine();
}

public static string ProcessBuffer(string buffer)
{
    Console.WriteLine(buffer);  
    return "";
}

public static IEnumerable<char> EnumerateString(string huh)
{
    for (int i = 0; i < huh.Length; i++) {
        Console.WriteLine("yielded: " + huh[i]);
        yield return huh[i];
    }
}
Rym
  • 650
  • 4
  • 16
  • 2
    Concatenating strings in a loop like that is usually not a good idea. – svick Oct 23 '12 at 11:38
  • I felt highlighting when to use StringBuilder was out of scope for the answer :) – Rym Oct 23 '12 at 15:06
  • 2
    Well, I think all answers should use best practices. You don't have to highlight it, but you should use it in your answer. – svick Oct 23 '12 at 17:57
0

Definitely not an elegant way to solve this, but it will work

static void Main(string[] args)
        {

            try
            {
                System.IO.TextReader readFile = new StreamReader(@"C:\Temp\test.txt");
                int count = 0;
                List<string> lines= new List<string>();
                foreach (string line in ReadLineFromFile(readFile))
                {
                    if (count == 10)
                    {
                        count = 0;
                        ProcessChunk(lines);
                        lines.Add(line);
                    }
                    else
                    {
                        lines.Add(line);
                        count++;
                    }

                }
                //PROCESS the LINES
                ProcessChunk(lines);

                Console.ReadKey();
            }
            catch (IOException ex)
            {
                Console.WriteLine(ex.ToString());
            }
        }

        private static void ProcessChunk(List<string> lines)
        {
            Console.WriteLine("----------------");
            lines.ForEach(l => Console.WriteLine(l));
            lines.clear();
        }

        private static IEnumerable<string> ReadLineFromFile(TextReader fileReader)
        {
            using (fileReader)
            {
                string currentLine;
                while ((currentLine = fileReader.ReadLine()) != null)
                {
                    yield return currentLine;
                }
            }
        }
Tariqulazam
  • 4,535
  • 1
  • 34
  • 42