0

I have the following code:

public static void PrependEntitiesToFile(string pathToFile)
{
    char[] buffer = new char[10000];
    FileInfo file = new FileInfo(pathToFile);

    string renamedFile = file.FullName + ".orig";
    System.IO.File.Move(file.FullName, renamedFile);

    using (StreamReader sr = new StreamReader(renamedFile))
    using (StreamWriter sw = new StreamWriter(file.FullName, false))
    {
        string entityDeclaration = "foo";
        sw.Write(entityDeclaration);
        string strFileContents = string.Empty;
        int read;
        while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
        {
            for (int i = 0; i < buffer.Length; i++)
            {
                strFileContents += buffer[i].ToString();
            }
        }
        sw.Write(strFileContents, 0, strFileContents.Length);

    }

    System.IO.File.Delete(renamedFile);
}

This code works but its SOOOOO SLOW because I am looping through each character in the buffer array and ToString()-ing it. Imagine how slow it is when its ingesting and duplicating a 823,000 character file.

Is there a way to convert a fully populated buffer to a string and add it at once rather than looping through each character in the array? Or, generically speaking, is there just a faster way to this?

I am so ashamed of this code.

cadrell0
  • 17,109
  • 5
  • 51
  • 69
Isaiah Nelson
  • 2,450
  • 4
  • 34
  • 53
  • 3
    why aren't you just using the overload that writes a buffer at a time? http://msdn.microsoft.com/en-us/library/bf931t09.aspx – tvanfosson Mar 01 '12 at 20:48

7 Answers7

6

Actually, that's not why it's so slow. It's slow because you're doing a repeated string concatenation and that's O(n^2). You'd see better performance if you used a StringBuilder. You could also just make an array of the characters, and call the constructor of string that takes such an array.

But even better, skip the string all together, and just write the buffer out to sw.

jason
  • 236,483
  • 35
  • 423
  • 525
  • Down stream there is a reason for storing it in a string. The reason is, based on a condition, the string may need to be prepended to the new file as its written in. I may just call the constructor of the string. This is definitely one of those times I may end up also using StringBuilder. – Isaiah Nelson Mar 01 '12 at 20:59
  • Okay, that's fine. Regardless, your main issue is the repeated concatenations. – jason Mar 01 '12 at 21:40
  • For an 800,000 char file, that means I will have a 8 "buffers" if each buffer stores up to 10,000 chars. Is there a way to cat character arrays together since I cant do buffer += buffer;? Othewise I am not sure how I will use the constructure of a string and pass in buffer just once. – Isaiah Nelson Mar 01 '12 at 22:07
  • by the way, thanks for calling me a Schlemiel... lol. Its true though, I am one. I started with C# and high level design, but not the small, low-level operations. – Isaiah Nelson Mar 01 '12 at 23:03
  • Yup. Declare a variable before the `while` loop: `var contents = Enumerable.Empty();` Then, inside the loop, append the contents of the buffer to `contents` via `contents = contents.Concat(buffer)`. Then, when you are ready to construct the string, use `new String(contents.ToArray());`. – jason Mar 02 '12 at 01:02
1

You should just pass your character array to the string constructor:

string strFileContents = new string(buffer);

I expect that to be much faster.

recursive
  • 83,943
  • 34
  • 151
  • 241
1

don't be, you are so close... i think, you can write the output like this: you seem to be it looping it twice...

using (StreamReader sr = new StreamReader(renamedFile))
using (StreamWriter sw = new StreamWriter(file.FullName, false))
{
    string entityDeclaration = "foo";
    sw.Write(entityDeclaration);
    string strFileContents = string.Empty;
    int read;
    while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
    {
        sw.Write(buffer);
    }
}
mindandmedia
  • 6,800
  • 1
  • 24
  • 33
1

Can I ask why you are doing this. BEcause it looks like your goal is to move the contents of one file to another char-by-char with no other processing. In which case there are much better ways.

But even if you do need to read everything in, have a look at File.ReadAllLines

Eoin Campbell
  • 43,500
  • 17
  • 101
  • 157
  • I have an xml file that needs a series of entities in a doctype prepended to it because a client isn't sending the xml with entities already in them. The parser freaks out when it encounters nbsp. So I am makign a copy and holding the entity in a string. Ill look at ReadAllLines, its probably a better solution than char arrays. – Isaiah Nelson Mar 01 '12 at 21:03
  • it looks, like he is prepending "foo" to it. would a `seek(0)` or similar make sense? – mindandmedia Mar 01 '12 at 21:03
  • @mindandmedia I added in foo for brevity sakes. Can you explain what seek is? – Isaiah Nelson Mar 01 '12 at 21:08
  • never mind that comment: http://stackoverflow.com/questions/1343044/c-prepending-to-beginning-of-a-file – mindandmedia Mar 01 '12 at 21:13
1

Just use the overload that takes a buffer, offset, and length and skip the conversion altogether.

using (StreamReader sr = new StreamReader(renamedFile))
using (StreamWriter sw = new StreamWriter(file.FullName, false))
{
    string entityDeclaration = "foo";
    sw.Write(entityDeclaration);
    string strFileContents = string.Empty;
    int read;
    while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
    {
        sw.Write( buffer, 0, read );
    }

}
tvanfosson
  • 524,688
  • 99
  • 697
  • 795
1

Don't use String. Use Stringbuilder instead.

In .NET, string are immutable, which means each time you change one, it makes a copy. That slows it down over time.

pearcewg
  • 9,545
  • 21
  • 79
  • 125
1

Should be able to do a GetString(). It reads in 2048 bytes at a time to do the conversion. You can also specify an encoding type to handle different charsets.

public enum EncodingType 
{ 
    ASCII, 
    Unicode, 
    UTF7, 
    UTF8 
} 
public static string ByteArrayToString(byte[] bytes) 
{ 
    return ByteArrayToString(bytes, EncodingType.Unicode); 
} 

public static string ByteArrayToString(byte[] bytes, EncodingType encodingType) 
{ 
    System.Text.Encoding encoding=null; 
    switch (encodingType) 
    { 
    case EncodingType.ASCII: 
        encoding=new System.Text.ASCIIEncoding(); 
        break;    
    case EncodingType.Unicode: 
        encoding=new System.Text.UnicodeEncoding(); 
        break;    
    case EncodingType.UTF7: 
        encoding=new System.Text.UTF7Encoding(); 
        break;    
    case EncodingType.UTF8: 
        encoding=new System.Text.UTF8Encoding(); 
        break;    
    } 
    return encoding.GetString(bytes); 
} 
Brian
  • 2,229
  • 17
  • 24