0

Been using the following console application:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Configuration;

namespace ConsoleApp1
{
    class Program
    {
        static StringBuilder sBuilder = new StringBuilder();
        static StreamWriter file;
        static void Main(string[] args)
        {
            try
            {
                using (file = new StreamWriter(ConfigurationManager.AppSettings["outFile"], true))
                {
                    ProcessDirectory(ConfigurationManager.AppSettings["inDir"]);

                }

            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                File.WriteAllText(ConfigurationManager.AppSettings["logFile"], ex.Message);
                throw;
            }

        }
        public static void ProcessDirectory(string targetDirectory)
        {
            string[] fileEntries = Directory.GetFiles(targetDirectory);
            foreach (string fileName in fileEntries)
                ProcessFile(fileName);

            string[] subdirectoryEntries = Directory.GetDirectories(targetDirectory);
            foreach (string subdirectory in subdirectoryEntries)
                ProcessDirectory(subdirectory);
        }

        public static void ProcessFile(string path)
        {

            var lines = File.ReadAllLines(path);

            var filtered = lines
            .Where(x => x[0] != '#')
            .Select(line => line.Split(' '))
            .Where(fields =>
                fields[8] != '-' // and other filtering
                )
            .Select(f => string.Join(" ", new string[] {
                    f[0],
                    f[8].ToLower().Replace("some_value",""),
                    ((some_contextual_condition || another_contextual_condition)? "1" : "0")
            }
            ))
            .Distinct();

            var sBuilder = new StringBuilder();

            filtered
                .ToList()
                .ForEach(f =>
                {
                    sBuilder.AppendLine(f);
                });

            file.Write(sBuilder.ToString());
        }
    }
}

There are about 3500 files as input, totaling 340 GBs. After processing about 400 files and about 200 write operations, nothing gets written anymore to the output file.

I've been trying writing line by line, by using StringBuilder as static class property or as a locally scoped variable in the ProcessFile method.

Attached image with running console application. You may notice that the output file size stopped increasing ~ by the time when file 380 was being processed. Try... catch embedding all Main method content catches nothing.

enter image description here

Alex Filipovici
  • 31,789
  • 6
  • 54
  • 78
  • @Evk, exactly, this is my n-th approach. I've been also writing the filtered values after each iteration (max 1.8MBs). – Alex Filipovici Nov 20 '17 at 22:03
  • And how do you know that nothing is written to output after 400 files if all is written at the end in current implementation? Or that refers to another approach where you used StreamWriter? Did you run under debugger to see where exactly it hangs? – Evk Nov 20 '17 at 22:05
  • I've been writing out the number of bytes and lines that get written to output. At some moment, after ~ 15 mins, these counters stop increasing. But I still can see that the files are being picked up in the console output. – Alex Filipovici Nov 20 '17 at 22:07
  • Then maybe you should post more complete code, because there can be bugs there (in counting) too. With code you provided I don't see how it can hang without any exceptions. – Evk Nov 20 '17 at 22:10
  • 2
    Have a look at TPL DataFlow - this is just wasting of RAM resources – Sir Rufo Nov 20 '17 at 22:29
  • where is `file` defined and where do you ever use it? – Rufus L Nov 20 '17 at 23:01
  • @RufusL, updated post. It was a mess, I just removed chunks of comments and probably removed also the code lines. – Alex Filipovici Nov 20 '17 at 23:35

3 Answers3

1

The first thing that jumps out at me is that you have no try-catch blocks. Your application has no way of handling or reporting exceptions.

Add a try-catch block around your reading and writing code, and send exceptions to a log so you can troubleshoot.

Dave Swersky
  • 34,502
  • 9
  • 78
  • 118
1

Have you tried writing to the output file after you process each one, rather than building a huge StringBuilder first? It may or may not help. I also switched to EnumerateFiles and ReadLines, which are better for reading large files:

class Program
{
    static void Main()
    {
        var targetDir = ConfigurationManager.AppSettings["inDir"];
        var outputFile = ConfigurationManager.AppSettings["outFile"];

        foreach (var fileName in Directory.EnumerateFiles(targetDir, "*", 
            SearchOption.AllDirectories))
        {
            ProcessFile(fileName, outputFile);
        }
    }

    public static void ProcessFile(string inputFile, string outputFile)
    {
        var lines = File.ReadLines(inputFile)
            .Where(x => x[0] != '#')
            .Select(line => line.Split(' '))
            .Where(fields =>
                fields[8] != "-" // and other filtering
            )
            .Select(f => string.Join(
                " ", f[0], f[8].ToLower().Replace("some_value", ""),
                true || false ? "1" : "0"))
            .Distinct();

        File.AppendAllLines(outputFile, lines);
    }
}
Rufus L
  • 36,127
  • 5
  • 30
  • 43
  • That was my 1st try, really. Using File.ReadAllLines and File.AppendAllLines. – Alex Filipovici Nov 20 '17 at 23:26
  • Yeah, not sure how much of a difference that would make. But what about writing the lines as you go (in `ProcessFile()`) instead of using the `StringBuilder`? Does that make any difference? – Rufus L Nov 20 '17 at 23:28
  • Nope, no difference, I went also this way, although this would be slow due to disk usage. – Alex Filipovici Nov 20 '17 at 23:35
0

9 WFEs. 1st WFE had 15 columns in IIS log structure. Next 8 WFEs were skipping column 3.

Nicely done, SP admins, nicely done!

Alex Filipovici
  • 31,789
  • 6
  • 54
  • 78