4

I get a text file from a mainframe and sometimes there are some 0x0D injected into the middle of the text lines.

The previos programmer created a method using the FileStream class. This method works fine but is taking around 30 minutes to go thru the entire file.

My thought was to pass the text lines that are needed (about 25 lines) to a method to decrease the processing time.

I've been working with the MemoryStream class but am having issue where it does not find the 0x0D control code.

Here is the current FileStream method:

private void ReplaceFileStream(string strInputFile)
{
    FileStream fileStream = new FileStream(strInputFile, FileMode.Open, FileAccess.ReadWrite);
    byte filebyte;

    while (fileStream.Position < fileStream.Length)
    {
        filebyte = (byte)fileStream.ReadByte();
        if (filebyte == 0x0D)
        {
            filebyte = 0x20;
            fileStream.Position = fileStream.Position - 1;
            fileStream.WriteByte(filebyte);
        }
    }
    fileStream.Close();
}

and here is the MemoryStream method:

private void ReplaceMemoryStream(string strInputLine)
{
    byte[] byteArray = Encoding.ASCII.GetBytes(strInputLine);
    MemoryStream fileStream = new MemoryStream(byteArray);

    byte filebyte;

    while (fileStream.Position < fileStream.Length)
    {
        filebyte = (byte)fileStream.ReadByte();
        if (filebyte == 0x0D)
        {
            filebyte = 0x20;
            fileStream.Position = fileStream.Position - 1;
            fileStream.WriteByte(filebyte);
        }
    }
    fileStream.Close();
}

As I have not used the MemoryStream class before am not that familar with it. Any tips or ideas?

Justin
  • 84,773
  • 49
  • 224
  • 367
HaySeed
  • 137
  • 2
  • 3
  • 7
  • 3
    Doesn't find it, or doesn't write your changes? You're basically modifying a byte array in memory. Your code does not write the changes back to disk. Indeed, the MemoryStream is completely superfluous as your code stands - you may as well have just iterated over the byte array and modified that. Then use File.WriteAllBytes to save it back to disk. – Kent Boogaart Sep 09 '11 at 15:36
  • 1
    I'm curious why you don't just do something like strInputLine.Replace('\x000D', '') then write the line. Mebe I am missing something? Or whatever the corresponding character is in place of the hex...you can escape and replace control characters. – Rig Sep 09 '11 at 15:48
  • Kent - never finds it Justin & Austin - good team work Rig - thought I tried everything but did not remember trying that. Was using "\r" thinking it would be the same but never found it – HaySeed Sep 09 '11 at 20:00

2 Answers2

3

I don't know the size of your files, but if they are small enough that you can load the whole thing in memory at once, then you could do something like this:

private void ReplaceFileStream(string strInputFile)
{
    byte[] fileBytes = File.ReadAllBytes(strInputFile);
    bool modified = false;
    for(int i=0; i < fileBytes.Length; ++i)
    {
        if (fileByte[i] == 0x0D)
        {
            fileBytes[i] = 0x20;
            modified = true;
        }
    } 

    if (modified)
    {
        File.WriteAllBytes(strInputFile, fileBytes);
    }
}

If you can't read the whole file in at once, then you should switch to a buffered reading type of setup, here is an example that reads from the file, writes to a temp file, then in the end copies the temp file over the original file. This should yield better performance then reading a file one byte at a time:

private void ReplaceFileStream(string strInputFile)
{
    string tempFile = Path.GetTempFileName();
    try
    {
        using(FileStream input = new FileStream(strInputFile,
            FileMode.Open, FileAccess.Read))
        using(FileStream output = new FileStream(tempFile,
            FileMode.Create, FileAccess.Write))
       {
           byte[] buffer = new byte[4096];
           bytesRead = input.Read(buffer, 0, 4096);
           while(bytesRead > 0)
           {
                for(int i=0; i < bytesRead; ++i)
                {
                    if (buffer[i] == 0x0D)
                    {
                        buffer[i] = 0x20;
                    }
                }

                output.Write(buffer, 0, bytesRead);
                bytesRead = input.Read(buffer, 0, 4096);
            }
            output.Flush();
        }

        File.Copy(tempFile, strInputFile);
    }
    finally
    {
        if (File.Exists(tempFile))
        {
            File.Delete(tempFile);
        }
    }
}
pstrjds
  • 16,840
  • 6
  • 52
  • 61
  • I was able to use the first code snippet loading the file into memory. This has decreased the processing time from 47 minutes to 5 seconds. I actually ran the test several times as I could not believe there was that huge of a time difference. – HaySeed Sep 09 '11 at 21:41
  • There is a massive difference when you read from a file in a buffered fashion rather than 1 byte at a time. With the first snippet you are reading the whole file as 1 chunk, so it really should be much faster. Just keep in mind that if the files are very large you would want to use the second snippet as you could run into issues loading the whole file into memory. – pstrjds Sep 10 '11 at 02:42
  • the files are always between 4-5 meg and have been for several years. At what size should I be concerned about using different chunks. – HaySeed Sep 12 '11 at 19:24
  • @HaySeed - I can't give you a definitive answer as I don't know if you are compiling this as a 32 or 64 bit app and what other things are running in your app to affect the memory usage at the point this code runs. If the files are 4 - 5 MB, then when you run the `ReadAllBytes` line you will have a byte array that is 4 - 5 MB in memory. If you have a file that is 1 GB, then you have to have at least 1 GB of available memory to load the whole thing in memory. – pstrjds Sep 12 '11 at 19:49
  • @HaySeed a quick look on SO turned up this [article](http://stackoverflow.com/questions/3944320/c-maximum-length-of-byte) since the array class is using an int for its length counter, you can't have an array larger than `Int32.MaxSize`, so you can't read a file larger than that in one chunk. – pstrjds Sep 12 '11 at 19:52
2

if your replacement code does not find the 0x0D in the stream and the previous method with the FileStream does it, I think it could be because of the Encoding you are using to get the bytes of the file, you can try with some other encoding types.

otherwise your code seems to be fine, I would use a using around the MemoryStream to be sure it gets closed and disposed, something like this:

using(var fileStream = new MemoryStream(byteArray))
{

  byte filebyte;

 // your while loop...

}

looking at your code I am not 100% sure the changes you make to the memory stream will be persisted; Actually I think that if you do not save it after the changes, your changes will be lost. I can be wrong in this but you should test and see, if it does not save you should use StreamWriter to save it after the changes.

Davide Piras
  • 43,984
  • 10
  • 98
  • 147