13

I am looking for a library function to convert floating point numbers to strings, and back again, in C++. The properties I want are that str2num(num2str(x)) == x and that num2str(str2num(x)) == x (as far as possible). The general property is that num2str should represent the simplest rational number that when rounded to the nearest representable floating pointer number gives you back the original number.

So far I've tried boost::lexical_cast:

double d = 1.34;
string_t s = boost::lexical_cast<string_t>(d);
printf("%s\n", s.c_str());
// outputs 1.3400000000000001

And I've tried std::ostringstream, which seems to work for most values if I do stream.precision(16). However, at precision 15 or 17 it either truncates or gives ugly output for things like 1.34. I don't think that precision 16 is guaranteed to have any particular properties I require, and suspect it breaks down for many numbers.

Is there a C++ library that has such a conversion? Or is such a conversion function already buried somewhere in the standard libraries/boost.

The reason for wanting these functions is to save floating point values to CSV files, and then read them correctly. In addition, I'd like the CSV files to contain simple numbers as far as possible so they can be consumed by humans.

I know that the Haskell read/show functions already have the properties I am after, as do the BSD C libraries. The standard references for string<->double conversions is a pair of papers from PLDI 1990:

  • How to read floating point numbers accurately, Will Klinger
  • How to print floating point numbers accurately, Guy Steele et al

Any C++ library/function based on these would be suitable.

EDIT: I am fully aware that floating point numbers are inexact representations of decimal numbers, and that 1.34==1.3400000000000001. However, as the papers referenced above point out, that's no excuse for choosing to display as "1.3400000000000001"

EDIT2: This paper explains exactly what I'm looking for: http://drj11.wordpress.com/2007/07/03/python-poor-printing-of-floating-point/

Neil Mitchell
  • 9,090
  • 1
  • 27
  • 85
  • When I searched for one, I found one in C -- not in C++. I don't have the link here. I seem to remember it was on the ftp site of NAG, but I could be wrong. – AProgrammer Aug 21 '09 at 10:52
  • A C library is equally fine - I'm just checking through the NAG docs now. – Neil Mitchell Aug 21 '09 at 11:10
  • You could look into GMP and MPFR for software floating-point emulation. But what you're asking for is nigh impossible with C++'s `float` and `double` types. – greyfade Aug 21 '09 at 20:10

6 Answers6

5

I am still unable to find a library that supplies the necessary code, but I did find some code that does work:

http://svn.python.org/view/python/branches/py3k/Python/dtoa.c?view=markup

By supplying a fairly small number of defines it's easy to abstract away the Python integration. This code does indeed meet all the properties I outline.

Neil Mitchell
  • 9,090
  • 1
  • 27
  • 85
4

I think this does what you want, in combination with the standard library's strtod():

#include <stdio.h>
#include <stdlib.h>

int dtostr(char* buf, size_t size, double n)
{
  int prec = 15;
  while(1)
  {
    int ret = snprintf(buf, size, "%.*g", prec, n);
    if(prec++ == 18 || n == strtod(buf, 0)) return ret;
  }
}

A simple demo, which doesn't bother to check input words for trailing garbage:

int main(int argc, char** argv)
{
  int i;
  for(i = 1; i < argc; i++)
  {
    char buf[32];
    dtostr(buf, sizeof(buf), strtod(argv[i], 0));
    printf("%s\n", buf);
  }
  return 0;
}

Some example inputs:

% ./a.out 0.1 1234567890.1234567890 17 1e99 1.34 0.000001 0 -0 +INF NaN
0.1
1234567890.1234567
17
1e+99
1.34
1e-06
0
-0
inf
nan

I imagine your C library needs to conform to some sufficiently recent version of the standard in order to guarantee correct rounding.

I'm not sure I chose the ideal bounds on prec, but I imagine they must be close. Maybe they could be tighter? Similarly I think 32 characters for buf are always sufficient but never necessary. Obviously this all assumes 64-bit IEEE doubles. Might be worth checking that assumption with some kind of clever preprocessor directive -- sizeof(double) == 8 would be a good start.

The exponent is a bit messy, but it wouldn't be difficult to fix after breaking out of the loop but before returning, perhaps using memmove() or suchlike to shift things leftwards. I'm pretty sure there's guaranteed to be at most one + and at most one leading 0, and I don't think they can even both occur at the same time for prec >= 10 or so.

Likewise if you'd rather ignore signed zero, as Javascript does, you can easily handle it up front, e.g.:

if(n == 0) return snprintf(buf, size, "0");

I'd be curious to see a detailed comparison with that 3000-line monstrosity you dug up in the Python codebase. Presumably the short version is slower, or less correct, or something? It would be disappointing if it were neither....

zaphod
  • 4,561
  • 3
  • 25
  • 26
  • 2
    I've investigated on a benchmark suite. The results on VS2008 of your version are just as good, but not identical - for example the first algorithm prefers 87.21565540666982 while yours prefers 87.21565540666983, but both have the same bit representation. Your algorithm is also 3% slower. But given 1000's of lines of ugly C vs your quite elegant answer, you definitely win :). – Neil Mitchell Mar 18 '14 at 18:00
1

The reason for wanting these functions is to save floating point values to CSV files, and then read them correctly. In addition, I'd like the CSV files to contain simple numbers as far as possible so they can be consumed by humans.

You cannot have conversion double → string → double and in the same time having the string human readable.

You need to need to choose between an exact conversion and a human readable string. This is the definition of max_digits10 and digits10:

Here is an implementation of num2str and str2num with two different contexts from_double (conversion double → string → double) and from_string (conversion string → double → string):

#include <iostream>
#include <limits>
#include <iomanip>
#include <sstream>

namespace from_double
{
  std::string num2str(double d)
  {
    std::stringstream ss;
    ss << std::setprecision(std::numeric_limits<double>::max_digits10) << d;
    return ss.str();
  }

  double str2num(const std::string& s)
  {
    double d;
    std::stringstream ss(s);
    ss >> std::setprecision(std::numeric_limits<double>::max_digits10) >> d;
    return d;
  }
}

namespace from_string
{
  std::string num2str(double d)
  {
    std::stringstream ss;
    ss << std::setprecision(std::numeric_limits<double>::digits10) << d;
    return ss.str();
  }

  double str2num(const std::string& s)
  {
    double d;
    std::stringstream ss(s);
    ss >> std::setprecision(std::numeric_limits<double>::digits10) >> d;
    return d;
  }
}

int main()
{
  double d = 1.34;
  if (from_double::str2num(from_double::num2str(d)) == d)
    std::cout << "Good for double -> string -> double" << std::endl;
  else
    std::cout << "Bad for double -> string -> double" << std::endl;

  std::string s = "1.34";
  if (from_string::num2str(from_string::str2num(s)) == s)
    std::cout << "Good for string -> double -> string" << std::endl;
  else
    std::cout << "Bad for string -> double -> string" << std::endl;

  return 0;
}
Daniel Laügt
  • 1,097
  • 1
  • 12
  • 17
1

You need a num2str() function that does not lose information. If two floating point numbers are different, also their string representations have to be different and it must be possible to recover the original binary number from the string. This property was first defined by Guy Steel and Jon White in How to print floating-point numbers accurately from 1990. Additionally, the string should be as short as possible.

As of 2023 the fastest algorithm for num2str() is DragonBox. The repo contains a reference implementation in C++, as well as a formal paper. Another implementation can be found in https://github.com/abolz/Drachennest.

strtod() from stdlib.h can be used for str2num().

Jörg Mische
  • 131
  • 2
0

Actually I think you'll find that 1.34 IS 1.3400000000000001. Floating point numbers are not precise. You can't get around this. 1.34f is 1.34000000333786011 for example.

Goz
  • 61,365
  • 24
  • 124
  • 204
0

As stated by others. Floating-point numbers are not that accurate its an artifact on how they store the value.

What you are really looking for is a Decimal number representation. Basically this uses an integer to store the number and has a specific accuracy after the decimal point.

A quick Google got this: http://www.codeproject.com/KB/mcpp/decimalclass.aspx

Peter Schaeffer
  • 371
  • 4
  • 9
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • In general decimal numbers are of course preferred, but in this case I really want to use floating point numbers due to other constraints on the system. – Neil Mitchell Aug 21 '09 at 11:14
  • Actually, I tend to prefer Rational numbers: http://haskell.org/ghc/docs/latest/html/libraries/base/Data-Ratio.html - they have much greater representational power. – Neil Mitchell Aug 21 '09 at 11:16
  • Yep. I have seen that as a way of representing arbitrary precision arithmetic (but that's not what you asked). You want(ed) a method to represent an arbitrary precision floating point value (similar but not the same). So WHAT are the unmentioned constraints that force you to use a flowting point value. – Martin York Aug 21 '09 at 16:09
  • The constraints are the usual: large existing code base, high-performance concerns, architecture limitations etc. – Neil Mitchell Sep 12 '09 at 16:37