1

I'have about 30 000 object to binary serialize in a file, I'm using with a simple foreach loop this basic code to do that :

FileStream fileStream = new FileStream(pathToFile, FileMode.Create);
BinaryFormatter binaryFormatter = new BinaryFormatter();
binaryFormatter.Serialize((Stream) fileStream, objectToSerialize);
fileStream.Close();

can I accelerate the processus using multithreading or another way? (memory stream etc)

Christophe Debove
  • 6,088
  • 20
  • 73
  • 124
  • 2
    I would assume that the disk access is the slowing factor. – Peter - Reinstate Monica Jul 10 '14 at 13:57
  • If the serialization of each object is independant and is written to a unique file then you could simply use a parallel foreach loop, but I would guess that the bottleneck is IO and not processing. – Dirk Jul 10 '14 at 13:57
  • 1
    See [this](http://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations) – Tzah Mama Jul 10 '14 at 13:57
  • Is it taking much time? Should be fast enough not to consider any improvements. – Konrad Kokosa Jul 10 '14 at 13:57
  • You can probably accelerate it by using a faster serializer, like `protobuf-net` – Rotem Jul 10 '14 at 14:00
  • @Peter Schneider disc access and serialisation process, but yes disc access slow the proccess, maybe write a huge single file is faster? – Christophe Debove Jul 10 '14 at 14:04
  • a kind of DirectoryStream does not exist? I mean one stream for the whole operation, or maybe a way to affect the disc fragmentation writing? – Christophe Debove Jul 10 '14 at 14:08
  • @KonradKokosa it's faster enougth but I want to know if there are different way. – Christophe Debove Jul 10 '14 at 14:09
  • I can imagine that a single file is faster because it potentially involves fewer disk seeks and less file system bookkeeping overhead. On a system which does cache write access to disks, and has enough memory, and is configured properly, the disk access may actually degenerate to memory access (namely into the cache). In that case multithreading (implying multiple files) may be indeed much faster. – Peter - Reinstate Monica Jul 10 '14 at 14:12
  • 2
    You can check the bookkeping overhead by just creating 30000 empty files. Curious how long that takes, under different disk load scenarios. – Peter - Reinstate Monica Jul 10 '14 at 14:14

2 Answers2

3

Reuse the binaryFormatter instead of recreating it each time. Using a BufferedStream between the FileStream and the formatter may also improve performance. BTW: Use "Using" to ensure that your file streams are closed, even in case of an exception being raised when serializing.

Udontknow
  • 1,472
  • 12
  • 32
1

The only way you will "squeeze" out some improved performance involving IO is ensuring you are not waiting on the IO while you could be doing CPU processing. However, you can't write to the disk in parallel. It can ONLY do one operation at a time.

So the two big things you can do is:

  1. Ensure you are not writing tiny (a few bytes) blocks. Instead, build up a buffer in memory, and then write the ENTIRE thing to the disk at once. This will cut back on the number of actual disk writes you are doing (which are very slow).
  2. While writing to the disk, you could be building up the next block (see #1) so that when the disk is done writing a block, it can immediately write another. This can be done using many schemes, but a popular one is to have two threads, one for creating blocks, and the other for writing to the disk. The first thread writes to a queue and the other dequeues.
poy
  • 10,063
  • 9
  • 49
  • 74