0

I'm writing an encryption program to encrypt files (large and small) to do this, my current method is to read 1024 bytes from a file, encrypt those bytes, and write them to a temporary file, and repeat until finish. Once this process finishes, the original file is deleted and the temporary file is renamed to take the name of the original.

Here is a sample piece of code that processes n bytes (n being 1024):

        private void processChunk(BinaryReader Input, BinaryWriter Output, int n)
    {
        // Read n bytes from the input fileStream
        Byte[] Data = Input.ReadBytes(n);
        // Read n bytes from the streamCipher
        Byte[] cipherData = StreamCipher.OutputBytes(n);
        for (int x = 0; x < n; x++)
            // XOR a byte of the input stream with a corresponding byte of the streamCipher
            Data[x] ^= cipherData[x];
        // Write n bytes to the output fileStream
        Output.Write(Data);
    }

So I'm pretty sure I can't multi-thread the encryption algorithm because the bytes are generated as a keystream and depend on the bytes generated before, but reading and writing from files and cpu operations can be?

What's the best strategy to take here?

Ryan Codrai
  • 172
  • 8

2 Answers2

0

You can do that like that :

  1. Read all the data and store it in a list where each entry is an array of bytes according to n
  2. run your encryption and keep all the encrypted bytes in memory.

  3. write all the output bytes at once.

this way you access files only twice.

stsur
  • 206
  • 2
  • 9
  • Impossible for larger files, to load a file which is 500mb or larger entirely into system memory would create errors, that's why I manage the data in blocks. Thanks for the suggestion though :) – Ryan Codrai Jan 22 '15 at 20:00
  • you can still do that , just define a maximal size. lets say 1MB and do that as many times as you need – stsur Jan 22 '15 at 20:02
  • so in effect what you're say is to do the encryption in chunks? I'm try to speed up the process by doing multiple tasks at once, cpu tasks can be performed at the same time as reading and writing to the disk, so why should I wait for data to be read when I can be encrypting bytes at the same time? – Ryan Codrai Jan 22 '15 at 20:26
0

Spontaneously, I would suggest to run three threads in parallel:

  1. A reader thread that reads chunks of data into memory.
  2. An encryption thread doing all the work.
  3. A writer thread that writes the encrypted data to disk.

The three threads communicate via two queues, like the BlockingCollection provided by .Net 4. See Fast and Best Producer/consumer queue technique BlockingCollection vs concurrent Queue.

So thread 1 fills queue 1, thread 2 reads queue 1 and fills queue 2, thread 3 reads queue 3. If any of the threads is faster than the others, the BlockingCollection will block the reading or writing thread until the thread on the other side has caught up. For example, if the BlockingCollection is set to a max size of 10, the reading thread will block after it has read 10 data chunks ahead of the encryption thread.

One more observation: Input.ReadBytes will allocate a new byte array on the heap for every read. This array will be discarded after the current chunk is processed, so if you have large files and a fast encryption algorithm, memory allocation and garbage collection could actually noticeably impact the performance (.Net zeros memory buffers upon allocation). Instead, you could use a pool of buffers that are reserved and returned by the read and encryption threads, and use the Stream.Read method that accepts an existing buffer to write into.

Community
  • 1
  • 1
Christoph
  • 2,211
  • 1
  • 16
  • 28
  • That is a great suggestion! I'm going to try and implement your idea as soon as i can, about your observation, are you saying it would be better to fill a buffer and overwrite it's data rather than allowing it to be collected by garbage? – Ryan Codrai Jan 22 '15 at 20:35
  • Yes, exactly. Since you are filling the buffer with new data anyway, overwriting the previous data, there is no point in having the .Net GC collect the buffers, reallocate and clear them on every cycle. Oh, and if you like my response, I would appreciate you marking it as the answer to your problem :-) – Christoph Jan 23 '15 at 03:23