0

I am using the standard BinaryFormatter to serialize a very large object graph with some third-party objects in the mix. I tried others like Protobuf/JSON/XML and for one reason or another they all failed. The data is essentially the results of a complicated AI algorithm and has a large number of doubles in a heavily nested tree. Many of which could be NaN.

It seems that when a double is NaN then the BinaryFormatter fails silently and deals with it. It would be nice if it handled it correctly.

The core issue is described in this link.

Is there a work around so I can directly deal with NaN? I can serialize things directly however that could be a lot of work.

Edit:

In one of the heavy offenders which is a Naive Bayes implementation the code is:

public double[][][] Distributions { get; private set; }
public double[] Priors { get; private set; }
Telavian
  • 3,752
  • 6
  • 36
  • 60
  • Does it have to be in the data as a double, or do you control the reading side also? You may be able to just bitcast it to a long and store that, depending on your requirements – harold May 05 '16 at 21:42
  • `BinaryFormatter` serializes fields not properties so that limits your options somewhat. Can you give some idea what your object graph looks like? Do the NaN values appear as fields in many different types, or just a small number of types? – dbc May 05 '16 at 22:07
  • Are you running in 32-bit or 64-bit mode? See [here](http://stackoverflow.com/questions/1953377) to check. I ask because [this post](https://stackoverflow.com/questions/24331540) suggests that handling of NaN values is especially slow in 32 bit code. – dbc May 05 '16 at 22:22
  • Set a breakpoint on the BinaryFormatter code. When it hits use Debug > Windows > Registers. Right-click the window and tick "Floating point", you should see `CTRL = 027F`. Right-click again and tick "SSE", you should see `MXCSR = 00001FA0`. If the values don't match then the "third-party objects" library is screwing with the processor initialization.and anything can happen. Throwing an exception and catching it again is a trivial way to reset these registers. – Hans Passant May 05 '16 at 23:21
  • The code is running on a x64 machine with the Any CPU flag set. The prefers 32-bit flag is not set. – Telavian May 06 '16 at 00:16

1 Answers1

1

All that springs to mind is this.

When you deserialize, it works on a stream. A stream is just a processor for reading bytes, and you can write a stream which re-writes another. Conceptually;

public NanToInfStreamReader: IStream
{
     NanToInfStreamReader(IStream source) 
     {
         ...
     }

     byte[] Read()
     {
         return ProtectAgainstNaN(source.Read());
     }
}

So the first part is to write a decorating stream like this, and search for any occurence of the 64 bits that represent Double.NaN. Substitute them in your stream for Double.Inf, say.

The BinaryFormatter will never see Double.NaN, and the speed issue won't occur.

However, now your data is filled with +Inf anywhere you had NaN, so you have to go back through your arrays and rewrite them.

It's not a great approach. But it sounds like you might be a bit stuck, and it's about all I can suggest.

Steve Cooper
  • 20,542
  • 15
  • 71
  • 88
  • Interesting approach. I never thought of this. – Telavian May 06 '16 at 16:24
  • If you try, get the stream to report the number of substitutions. Count the number of fixes you apply post-deserialise. If the counts aren't the same, you need to work in your patch up code. If the same, you can be confident it's right. – Steve Cooper May 06 '16 at 16:55