10

I started off doing something as follows:

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    foreach (MyClassA myClassA in myClassAs)
    {
        textWriter.WriteLine(myIO.GetCharArray(myClassA));

        if (myClassA.MyClassBs != null)
            myClassA.MyClassBs.ToList()
                .ForEach(myClassB =>
                    textWriter.WriteLine(myIO.GetCharArray((myClassB)));

        if (myClassA.MyClassCs != null)
            myClassA.MyClassCs.ToList()
                .ForEach(myClassC =>
                    textWriter.WriteLine(myIO.GetCharArray(myClassC)));
    }
}

This seemed pretty slow (~35 seconds for 35,000 lines).

Then I tried to follow the example here to create a buffer, with the following code, but it didn't gain me anything. I was still seeing times around 35 seconds. Is there an error in how I implemented the buffer?

using (TextWriter textWriter = new StreamWriter(filePath, append))
{
    char[] newLineChars = Environment.NewLine.ToCharArray();
    //Chunk through 10 lines at a time.
    int bufferSize = 500 * (RECORD_SIZE + newLineChars.Count());
    char[] buffer = new char[bufferSize];
    int recordLineSize = RECORD_SIZE + newLineChars.Count();
    int bufferIndex = 0;

    foreach (MyClassA myClassA in myClassAs)
    {
        IEnumerable<IMyClass> myClasses =
            new List<IMyClass> { myClassA }
                .Union(myClassA.MyClassBs)
                .Union(myClassA.MyClassCs);

        foreach (IMyClass myClass in myClasses)
        {
            Array.Copy(myIO.GetCharArray(myClass).Concat(newLineChars).ToArray(),
                0, buffer, bufferIndex, recordLineSize);

            bufferIndex += recordLineSize;

            if (bufferIndex >= bufferSize)
            {
                textWriter.Write(buffer);

                bufferIndex = 0;
            }
        }
    }

    if (bufferIndex > 0)
        textWriter.Write(buffer);
}

Is there a better way to accomplish this?

Austin Salonen
  • 49,173
  • 15
  • 109
  • 139
lintmouse
  • 5,079
  • 8
  • 38
  • 54
  • Have you tried using StringBuilder and build the string then just write once to the file. – Sorceri Jun 26 '13 at 16:20
  • any specific reason to use buffers? – JustCode Jun 26 '13 at 16:21
  • 8
    How certain are you that it's really the IO that's the problem here? What happens if you just write to a `StreamWriter` wrapped around a `MemoryStream`, for example? – Jon Skeet Jun 26 '13 at 16:21
  • Left justify the code for readability. What is MyClassBs? – paparazzo Jun 26 '13 at 16:35
  • @Blam - MyClassB (as well as MyClassA and MyClassC) are typical POCOs. MyClassA contains a list of MyClassBs and a list of MyClassCs. Does that answer your question? – lintmouse Jun 26 '13 at 16:42
  • @JustCode - it is based on the assumption that there is an expense attached to writing a line that is mitigated by writing more than one line at a time. Not sure if it's an accurate assumption, but I got the idea from the article I linked. – lintmouse Jun 26 '13 at 16:43
  • @Sorceri - I haven't tried that. I can give it a shot, though. – lintmouse Jun 26 '13 at 16:48
  • How do you know 35 seconds is "pretty slow"? Test the IO first to get some bounds? before tinkering with working c# code... – Vivek Jun 26 '13 at 16:52
  • @Jon Skeet - I tried using a MemoryStream like you suggested, and it took about the same amount of time, so I think the issue must be with the GetCharArray() method. I'm going to look into it some more. Thanks. – lintmouse Jun 26 '13 at 16:57
  • Did not ask what MyClassA is. Asked what MyClassBs is. List? OverservableCollection? Enumerable? Might be able to eliminate the ToList() call. – paparazzo Jun 26 '13 at 17:02
  • @Blam - MyClassBs is an IEnumerable. – lintmouse Jun 26 '13 at 17:04
  • @Sorceri The downside of building all the lines in memory is that you have to have all the lines in memory. E.g. i wrote a data export that writes out a few billion lines. It would be a waste to cache that massive string in memory. – Ian Boyd Jun 26 '13 at 20:03
  • @IanBoyd I agree if you are producing a billion lines it would be inefficient but if you are doing a few then it would be fine. Without know the details its hard to give the correct solution. – Sorceri Jun 26 '13 at 20:12

3 Answers3

12

I strongly suspect that the majority of your time is not spent in the I/O. There's no way that it should take 35 seconds to write 35,000 lines, unless those lines are really long.

Most likely, the majority of time is spent in the GetCharArray method, whatever that does.

A few suggestions:

If you really think I/O is the problem, increase the stream's buffer size. Call the StreamWriter constructor that lets you specify a buffer size. For example,

using (TextWriter textWriter = new StreamWriter(filePath, append, Encoding.Utf8, 65536))

That'll perform better than the default 4K buffer size. Going higher than 64K for the buffer size is not generally useful, and can actually decrease performance.

Don't pre-buffer lines or append to a StringBuilder. That might give you small performance increases, but at a huge cost in complexity. The small performance boost isn't worth the maintenance nightmare.

Take advantage of foreach. You have this code:

if (myClassA.MyClassBs != null)
    myClassA.MyClassBs.ToList()
        .ForEach(myClassB =>
            textWriter.WriteLine(myIO.GetCharArray((myClassB)));

That has to create a concrete list from whatever MyClassBs collection is, and then enumerate it. Why not just enumerate the thing directly:

if (myClassA.MyClassBs != null)
{
    foreach (var myClassB in myClassA.MyClassBs)
    {
        textWriter.WriteLine(myIO.GetCharArray((myClassB)));
    }
}

That will save you the memory required by the ToList, and the time it takes to enumerate the collection when creating the list.

All that said, it's almost certain that your GetCharArray method is the thing that's taking all the time. If you really want to speed up your program, look there. Trying to optimize writing to the StreamWriter is a waste of time. You're not going to get significant performance increases there.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • 3
    You should probably avoid using [.ForEach() anyway](http://blogs.msdn.com/b/ericlippert/archive/2009/05/18/foreach-vs-foreach.aspx). – Erik Philips Jun 27 '13 at 01:23
1

I threw together a simple snippet that I think is a bit cleaner; but, then again, I'm not quite sure what you are trying to accomplish. Also, I don't have any of your classes available, so I can't really do any kind of tests.

This sample does basically the same thing you have; except that it uses some generic methods, and it does all the writing in one spot.

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
    List<char[]> outputLines = new List<char[]>();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Add(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        outputLines.AddRange(myClassA.MyClassBs.Select(myClassB => myIO.GetCharArray(myClassB)));

    if (myClassA.MyClassCs != null)
        outputLines.AddRange(myClassA.MyClassCs.Select(myClassC => myIO.GetCharArray(myClassC)));
}

var lines = outputLines.Select(line => string.Concat<char>(line));
if (append)
    File.AppendAllLines(filePath, lines);
else
    File.WriteAllLines(filePath, lines);

Here's the StringBuilder version:

string filePath = "MickeyMouse.txt";
bool append = false;
List<MyClassA> myClassAs = new List<MyClassA> { new MyClassA() };
StringBuilder outputLines = new StringBuilder();

foreach (MyClassA myClassA in myClassAs)
{
    outputLines.Append(myIO.GetCharArray(myClassA));

    if (myClassA.MyClassBs != null)
        myClassA.MyClassBs.ForEach(myClassB=>outputLines.Append(myClassB));

    if (myClassA.MyClassCs != null)
        myClassA.MyClassCs.ForEach(myClassC => outputLines.Append(myClassC));
}

if (append)
    File.AppendAllText(filePath, outputLines.ToString());
else
    File.WriteAllText(filePath, outputLines.ToString());
John Kraft
  • 6,811
  • 4
  • 37
  • 53
  • I tried that and saw an improvement of 2 - 3 seconds, so it's better than what I had. Thanks for sharing. – lintmouse Jun 26 '13 at 19:42
  • As some others have said, you might try using a StringBuilder and appending the test, as opposed to using the list, then writing the StringBuilder's ToString() as a single write. I don't know how much/if that would improve it at all. – John Kraft Jun 26 '13 at 19:44
0

Use buffered stream for writing

e.g. for buffered writing to Console use

TextWriter w = new StreamWriter(new BufferedStream(Console.OpenStandardOutput()));
    w.WriteLine("Your text here");

similarly for buffered writing to file use

TextWriter w = new StreamWriter(new BufferedStream(new FileStream("myFilePath.txt", FileMode.Create)));
w.WriteLine("Your text here");
Deep
  • 5,772
  • 2
  • 26
  • 36