I am building a system to read tables from heterogeneous documents and would like to know the best way of managing (columns of) floating point numbers. Where the column can be represented as real numbers I will use List<Double>
(I'm using Java but experience from other languages would be useful.) I also wish to serialize the table as a CSV file. Thus a table might look like:
"material", "mass (g)", "volume (cm3)",
"iron", 7.8, 1.0,
"aluminium", 27.3, 9.9,
and column2 (1-based) would be represented by a List<Double>
{new Double(7.8), new Double(27.3)}
I may also wish to compute the density (mass/volume) and derive a new column ("density (g.cml-3)") as a List
{new Double(7.8), new Double(2.76)}
However the input values are sometimes missing, unusual or represented by fuzzy concepts. Some transformations may throw exceptions (which I would catch and replace by one of the above). Examples include:
1.0E+10000
>10
10 / 0.0 (i.e. divide by zero)
Math.sqrt(-1.)
Math.tan(Math.PI/2.0)
I have the following options in Java for unusual values of a list element
- null reference
Double.NaN
Double.MAX_VALUE
Double.POSITIVE_INFINITY
Are there protocols for when the Java unusual values above should be used? I have read this question on how they behave. (I would like to rely on chaining of their operations). And if there are protocols can the values be serialized and read back in? (e.g. does Java parse "0x7ff0000000000000L"
to a number equal to Double.POSITIVE_INFINITY
I am prepared for some loss of precision in specification (there are often errors in OCR, missing digits etc. so this is a "good enough" exercise).