2

Hello I am trying to rewrite file by replacing bytes but it takes too much time to rewrite large files. For example on 700MB this code was working about 6 minutes. Pls help me to make it work less than 1 minute.

static private void _12_56(string fileName)
{
    byte[] byteArray = File.ReadAllBytes(fileName);
    for (int i = 0; i < byteArray.Count() - 6; i += 6)
    {
        Swap(ref byteArray[i], ref byteArray[i + 4]);
        Swap(ref byteArray[i + 1], ref byteArray[i + 5]);
    }
    File.WriteAllBytes(fileName, byteArray);
}
johnny 5
  • 19,893
  • 50
  • 121
  • 195
  • 3
    It's probably slow because you're reading the whole file into memory. I don't know what `Swap` does, is it necessary to hold the whole file, or can you just read chunks of 1MB and work on that at a time? It would also be a good idea to use the Visual Studio profiler to see exactly what is slow about it. – Jim W May 24 '18 at 18:25
  • You can check this question/answer,is good reply! https://stackoverflow.com/questions/955911/how-to-write-super-fast-file-streaming-code-in-c – Francesco Paolo Passaro May 24 '18 at 18:28
  • @JimW Swap just swap bytes using temp variable. For me it's important to replace 1st byte with 4th and 2nd with 5th for each 6 bytes. – Victor Semeniuk May 24 '18 at 18:29
  • You can read and write by byte but not sure it would be faster. – paparazzo May 24 '18 at 18:31

2 Answers2

5

Read the file in chuncks of bytes which are divisible by 6. Replace the necessary bytes in each chunk and write each chunk to another file before reading the next chunk.

You can also try to perform the read of the next chunk in parallel with writing the next chunk:

using( var source = new FileStream(@"c:\temp\test.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
    using( var target = new FileStream(@"c:\temp\test.txt", FileMode.Open, FileAccess.Write, FileShare.ReadWrite))
    {
        await RewriteFile(source, target);
    }
}


private async Task RewriteFile( FileStream source, FileStream target )
{
    // We're reading bufferSize bytes from the source-stream inside one half of the buffer
    // while the writeTask is writing the other half of the buffer to the target-stream.

    // define how many chunks of 6 bytes you want to read per read operation
    int chunksPerBuffer = 1;
    int bufferSize = 6 * chunksPerBuffer;

    // declare a byte array that contains both the bytes that are read
    // and the bytes that are being written in parallel.
    byte[] buffer = new byte[bufferSize * 2];
    // curoff is the start-position of the bytes we're working with in the 
    // buffer
    int curoff = 0;

    Task writeTask = Task.CompletedTask;
    int len;

    // Read the desired number of bytes from the file into the buffer.
    // In the first read operation, the bytes will be placed in the first
    // half of the buffer.  The next read operation will read them in 
    // the second half of the buffer.      
    while ((len = await source.ReadAsync(buffer, curoff, bufferSize).ConfigureAwait(false)) != 0)
    {
        // Swap the bytes in the current buffer.
        // When reading x * 6 bytes in one go, every 1st byte will be replaced by the 4th byte; every 2nd byte will be replaced by the 5th byte.
        for (int i = curoff; i < bufferSize + curoff; i += 6)
        {
            Swap(ref buffer[i], ref buffer[i + 4]);
            Swap(ref buffer[i + 1], ref buffer[i + 5]);
        }

        // wait until the previous write-task completed.
        await writeTask.ConfigureAwait(false);
        // Start writing the bytes that have just been processed.
        // Do not await the task here, so that the next bytes 
        // can be read in parallel.
        writeTask = target.WriteAsync(buffer, curoff, len);

        // Position the pointer to the beginnen of the other part
        // in the buffer
        curoff ^= bufferSize;                        

    }

    // Make sure that the last write also finishes before closing
    // the target stream.
    await writeTask.ConfigureAwait(false);
}

The code above should read a file, swap bytes and rewrite to the same file in parallel.

Frederik Gheysels
  • 56,135
  • 11
  • 101
  • 154
  • As an addendum, I would consider not worrying about dividing into 6 specifically. Pick a suitable static chunk size and asynchronously read as many chunks as necessary. That way if your file size or performance requirements change, you aren't left 'locked in'. – SomeGuy May 24 '18 at 18:43
  • Pretty sure referencing the task and running it multiple times like this will not work. You need a new call to readAsync every iteration. – gnud May 24 '18 at 18:51
  • No, it doesn't. It always reads the same `bufferSize` bytes from the beginning of the file, and never terminates, if the file is larger than the buffer. – gnud May 24 '18 at 19:04
  • "to another file" might not be acceptable, for security/capacity/transactional reasons. – H H May 24 '18 at 19:06
  • @gnud, read the code, try it. it works. The Read operation moves the file pointer to the next position. The content is read into the buffer which is double the size; and alternates. Try it and see for yourself. The code still needs some checks if the last read didn't read a number of bytes that is divisible by 6; in that case you cannot swap everything in the buffer – Frederik Gheysels May 24 '18 at 19:10
  • 1
    @FrederikGheysels I'm sorry - I read your code three times, and missed that you reassign readTask. Why not just call ReadAsync in the loop? Really threw me... – gnud May 24 '18 at 19:14
  • So, how much faster is your approach compared to original? – Evk May 24 '18 at 19:25
  • @gnud; indeed, it is clearer if the source is read in the while part. I've modified the code. – Frederik Gheysels May 24 '18 at 19:35
  • It does not overwrite file. – Victor Semeniuk May 24 '18 at 20:15
  • @FrederikGheysels I have just copy pasted it and changed @"c:\temp\test.txt" – Victor Semeniuk May 24 '18 at 21:20
  • did you replace both occurences of that string in the sourcecode ? – Frederik Gheysels May 24 '18 at 21:27
  • @FrederikGheysels Yes... Also when I change target file to empty file, it does not even write there. – Victor Semeniuk May 25 '18 at 07:41
  • Do you also close the file after everything is done ? – Frederik Gheysels May 25 '18 at 09:33
  • @FrederikGheysels Do we need to close file inside using statement? If yes can you please share final version of code... Is it working fine on your computer? – Victor Semeniuk May 25 '18 at 17:13
  • No, the using statement closes the file. The code is working on my computer (I've tested it with a small file). If you have the code somewhere on github or so, I can take a look – Frederik Gheysels May 25 '18 at 19:00
4

As the other answer says, you have to read the file in chunks.

Since you are rewriting the same file, it's easiest to use the same stream for reading and writing.

using(var file = File.Open(path, FileMode.Open, FileAccess.ReadWrite)) {        
    // Read buffer. Size must be divisible by 6
    var buffer = new byte[6*1000]; 

    // Keep track of how much we've read in each iteration
    var bytesRead = 0;      

    // Fill the buffer. Put the number of bytes into 'bytesRead'.
    // Stop looping if we read less than 6 bytes.
    // EOF will be signalled by Read returning -1.
    while ((bytesRead = file.Read(buffer, 0, buffer.Length)) >= 6)
    {   
        // Swap the bytes in the current buffer
        for (int i = 0; i < bytesRead; i += 6)
        {
            Swap(ref buffer[i], ref buffer[i + 4]);
            Swap(ref buffer[i + 1], ref buffer[i + 5]);
        }

        // Step back in the file, to where we filled the buffer from
        file.Position -= bytesRead;
        // Overwrite with the swapped bytes
        file.Write(buffer, 0, bytesRead);
    }
}
gnud
  • 77,584
  • 5
  • 64
  • 78
  • 1
    I like this answer better but you could also open 2 FileStreams (1 r, 1 w) to the same file. Your approach might be wasting some of the lower level buffering. – H H May 24 '18 at 19:09
  • @gnud just wondering, why is it more efficient to chunk the file? – johnny 5 May 24 '18 at 19:23
  • Thanks, it's working about 30 seconds on 700 MB file. – Victor Semeniuk May 24 '18 at 20:16
  • 1
    @johnny5 The way I think about this, comes from the day of spinning disks. With spinning disks, you might issue way too many seeks (move read/write head over disk) if you do many small read/write-operations. That physical reason is not the same with SSDs. Still, there will be a system call for every `read`/`write` if they're not buffered. I'm sure there's "invisible" buffering happening at the OS level and at the disk level, and it's possible there won't be a major difference. Hard to test though - exactly because of those caches. – gnud May 24 '18 at 20:40
  • @HenkHolterman Would be interesting to test. Would also be simple to do. Just add another stream, read from one, write to the other, don't change the `Position`. Again, it's really hard to test this stuff because of disk caches. It's easy to test with warm cache - not with cold. – gnud May 24 '18 at 20:42
  • Yes, you would want to avoid changing the Position, esp on the reader. And I would up that 1000 quite a few times. – H H May 24 '18 at 21:56
  • @HenkHolterman With warm cache, increasing the buffer size to `6*1000*1000` speeds up a 700M file by about 10%. Switching to two streams is not noticable on my computer. – gnud May 24 '18 at 22:44