5

I have a program written in C++ which is generating C source code for mathematical calculations. I have noticed that the constants take up very much space in the generated code and am looking for a more compact representation.

To generate constants, I am now using:

double v = ...
cfile << std::scientific << std::setprecision(std::numeric_limits<double>::digits10 + 1) << v;

I am pretty sure that this is a lossless representation, but it is also very bloated. For example a zero and a one would be represented as something like 0.0000000000000000e+00 and 1.0000000000000000e+00. And "0." or "1." carries just as much information.

Is there a way to print constants to file in a more compact, but still lossless manner? It does not need to look good for a human reader, just compile when present in plain C code (if C99, I would prefer if it's also valid C++). Hexadecimal could be ok if it is portable.

EDIT: Removed std::fixed in code snippet.

Joel
  • 1,295
  • 15
  • 30
  • It's been a while, but look [here](http://en.wikipedia.org/wiki/Huffman_coding), Huffman-encoding might suit you. – bash.d Mar 01 '13 at 11:12
  • 4
    Perhaps I misunderstood, but wouldn't removing trailing zeros be the solution? – jogojapan Mar 01 '13 at 11:13
  • Related: http://stackoverflow.com/questions/4738768/printing-double-without-losing-precision – jogojapan Mar 01 '13 at 11:25
  • possible duplicate of [C - Serialization of the floating point numbers (floats, doubles)](http://stackoverflow.com/questions/1786137/c-serialization-of-the-floating-point-numbers-floats-doubles) – Johan Kotlinski Mar 01 '13 at 12:36
  • @kotlinski That's not a duplicate. That question asks for binary serialization. This one wants to output the numbers in a format that is valid C/C++ code. – jogojapan Mar 01 '13 at 12:58
  • @jogojapan Sure, removing trailing zeros would be one of the things needed. But I am mainly looking for some standard way or C++ library to codegen the constants. – Joel Mar 01 '13 at 13:22
  • @jogojapan About trailing zeros again. I am also not sure how to remove the trailing zeros in a safe and portable way. – Joel Mar 01 '13 at 13:50
  • You might want to take a look at this article: http://www.cs.washington.edu/education/courses/cse590p/590k_02au/print-fp.pdf – MadScientist Mar 01 '13 at 14:28
  • Just print to a string, get rid of trailing zeroes, and print that out. – vonbrand Mar 01 '13 at 16:27

4 Answers4

11

You can use hexadecimal floating point (The format specifier %a for printf() in C); it's defined to preserve all bits of precision (C11, 7.21.6.1p8, a,A specifiers).

cfile << std::hexfloat << v;

If your compiler/standard library doesn't support hexfloat, you can use C99 %a printf specifier (this is equivalent, as specified in C++11 table 88 under section 22.4.2.2.2):

printf("%a", v);

For example, the following program is valid C99:

#include <stdio.h>
int main() {
   double v = 0x1.8p+1;
   printf("%a\n", v);
}

Your generated source file will not be valid C++11 as rather absurdly C++11 does not support hexadecimal floating point literals. However many C++11 compilers support C99 hexadecimal floating point literals as an extension.

Community
  • 1
  • 1
ecatmur
  • 152,476
  • 27
  • 293
  • 366
  • 1
    It's the routine of a code generator, it may **check** the value to write to decide its best (=shortes) representation! – Adriano Repetti Mar 01 '13 at 12:15
  • @ecatmur Thanks for this pointer and especially for pointing out that it's not valid C++11. Because of this I will probably avoid this notation since that it's important that my generator compiles on with C++11 (and preferably C++03, which some of my users have). – Joel Mar 01 '13 at 13:41
3

This is not a problem of representation, language or standard library but of algorithm. If you have a code generator then...why don't you change the generated code to be the best (= shortest with required precision) representation? It's what you do when you write code by hand.

In the hypothetical put_constant(double value) routine you may check what's the value you have to write:

  • Is it an integer? Don't bloat the code with std::fixed and set_precision, just cast to integer and add a dot.
  • Try to convert it to string with default settings then convert it back to double, if nothing changed then default (short) representation is good enough.
  • Convert it to string with your actual implementation, and check its length. If it's more than N (see later) use another representation otherwise just write it.

A possible (short) representation for floating point numbers when they have a lot of digits is to use their memory representation. With this you have a pretty fixed overhead and length won't ever change so you should apply it only for very long numbers. A naive example to show how it may work:

#define USE_L2D __int64 ___tmp = 0;
#define L2D(x) (double&)(___tmp=x)

int main(int argc, char* argv[])
{
    // 2.2 = in memory it is 0x400199999999999A

    USE_L2D
    double f1 = L2D(0x400199999999999A);
    double f2 = 123456.1234567891234567;

    return 0;
}
Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • 1
    I accepted this as my answer. Testing different printouts and chosing the best one is probably the best way to go as you suggest. Thanks! – Joel Mar 01 '13 at 13:36
1

First, you're contradicting yourself when you first say std::scientific, and then std::fixed. And second, you probably don't want either. The default format is generally designed to do this best. The default format doesn't have a name, nor a manipulator, but is what you get if no other format has been specified, and can be set (in case other code has set a different format) using:

cfile.setf( std::ios_base::fmtflags(), std::ios_base::floatfield );

I'd recomment using this. (You still need the precision, of course.)

James Kanze
  • 150,581
  • 18
  • 184
  • 329
-4

I'm not sure you can pass floating points losslessly like this. Floating points are necessarily lossy. While they can represent a subset of values precisely you cannot include ALL the significant figures - different hardware may have different representations so you cannot guarantee no loss of information. Even if you could pass it all across as the value may not be representable by the receiving hardware.

A plain ofstream::operator<< would print out as many digits as required, though, so there isn't really a need to complicate matters.

Arkady
  • 358
  • 1
  • 2
  • 8
  • I don't think the last statement is correct. Default precision doesn't print as many digits as can be internally represented. – jogojapan Mar 01 '13 at 11:26
  • If both the reader and the writer use the same base for floating point, and have the same number of significant digits in that base, you can ensure exact transmission using decimal, provided you use enough decimal digits precision. (For IEEE, 17 digits suffice.) – James Kanze Mar 01 '13 at 11:26
  • “While they can represent a subset of values precisely you cannot include ALL the significant figures” Yes you can. Why would you not be able to? And you don't need to pass all significant digits, only enough to make it unambiguous which floating-point number is meant. “different hardware may have different representations so you cannot guarantee no loss of information” This is why the IEEE 754 standard was published, **in 1985**: so that we could have the same representations on all computers. The problem you refer to was solved by the publication of that standard more than 20 years ago. – Pascal Cuoq Mar 01 '13 at 13:29
  • 1
    No, floating-point values are **not** "necessarily lossy". Every floating-point value has a well-defined internal representation, and it's not at all unreasonable to want to write that value out with the shortest possible external representation and read it back and get the same internal representation. The techniques for doing this were developed back in the 70s. Essentially, you keep writing digits until the value represented by the digits is closer to the internal value than to either of its immediate neighbors. Unfortunately, this requires unbounded integer types in some cases. – Pete Becker Mar 01 '13 at 14:12