2

I have some C++ code which outputs an array of double values. I want to use these double values in python. The obvious and easiest way to transfer the values would of course be dumping them into a file and then rereading the file in python. However, this would lead to loss of precision, because not all decimal places may be transferred. On the other hand, if I add more decimal places, the file gets larger. The array I am trying to transfer has a few million entries. Hence, my idea is to use the double's binary representation, dump them into a binary file and rereading that in python.

The first problem is, that I do not know how the double values are formatted in memory, for example here. It is easy to read the binary representation of an object from memory, but I have to known where the sign bit, the exponent and the mantiassa are located. There are of course standards for this. The first question is therefore, how do I know which standard my compiler uses? I want to use g++-9. I tried googling this question for various compilers, but without any precise answer. The next question would be on how to turn the bytes back into a double, given the format.

Another possibility may be to compile the C++ code as a python module and use it directly, transferring the array without a file from memory only. But I do not know if this would be easy to set up quickly.
I have also seen that it is possible to compile C++ code directly from a string in python using numpy, but I cannot find any documentation for that.

HerpDerpington
  • 3,751
  • 4
  • 27
  • 43
  • Why do you need to know the memory representation? It’s most likely IEEE 754 everywhere you use it so you can just read and write it directly – Sami Kuhmonen May 28 '20 at 10:26
  • 1
    Does this answer your question? [What is the best method to read a double from a Binary file created in C?](https://stackoverflow.com/questions/631607/what-is-the-best-method-to-read-a-double-from-a-binary-file-created-in-c) – Sami Kuhmonen May 28 '20 at 10:26
  • 1
    I'd be surprised if your python and C++ implementations differed (assuming they both run on the same platform). – john May 28 '20 at 10:26
  • 1
    use the `struct` library, that's what it is for – juanpa.arrivillaga May 28 '20 at 10:45

2 Answers2

2

You could write out the double value(s) in binary form and then read and convert them in python with struct.unpack("d", file.read(8)), thereby assuming that IEEE 754 is used.

There are a couple of issues, however:

  • C++ does not specify the bit representation of doubles. While it is IEEE 754 on any platform I have come across, this should not be taken for granted.
  • Python assumes big endian byte ordering. So on a little endian machine you have to tell struct.unpack when reading or change endianess before writing.

If this code is targeted for a specific machine I would advice to just test the approach on the machine. This code should then not be assumed to work on other architectures, so it is advisable that you have checks in your Makefile/CMakefile that refuses to build on unexpected targets.

Another approach would be to use a common serialization format, such as protobuf. They essentially have to deal with the same problems but I would argue that they have solved it.

mrksngl
  • 109
  • 4
1

I have not checked that, but probably python's C++ interface will store doubles by just copying the binary image they represent (the 64bit image) as most probably both languages use the same internal representation of binary floating point numbers (IEEE-754 binary 64bit format) This has one reason: it is because both use the floating point coprocessor to operate on them, and that's the format it requires to pass it the numbers.

One question arises on that, as you don't say: How have you determined that you are lossing precision in the data? Have you checked different decimal digits only? Or have you exported the actual binary format to check for differences in the bit patterns? A common mistake is to print both numbers with, let's say 20 significand digits, and then observe differences in the last two or three digits. This is because you are failing to acquaint on that doubles represented this way (in binary IEEE-752 format) have only around 17 significant digits (it depends on the number, but you can have differences on digit 17th or later, this is because the numbers are binary encoded)

What I strongly don't recommend to you is to convert those numbers into a decimal representation and send them as ascii strings. You are going to lose some precision (in form of rounding errors, see below) in the encoding, and then again in the decoding phase in python. Think that converting (even at the maximum precision) a binary floating point into decimal, and then back to binary is almost always a lossing information process. The problem is that a number that can be represented exactly in decimal (like 0.1) cannot be represented exactly in binary form (you get a periodic infinite repeating sequence, as when you divide 1.0 by 3.0 in decimal, you get a result that is not exact) The opposite conversion is different, as you can always convert a finite decimal binary number into a finite decimal base ten number, but not within 53 bits --which is the amount of bits dedicated to the significand in 64 bit floating point numbers)

So, my advice is to recheck where your numbers show differences and compare with what I say here (if the numbers show differences in digit positions after the 16th decimal digit, those differences are ok --- they have to do only with the different algorithms used by C++ library and python library to convert the numbers into decimal format) If the differences occur before that, check how are represented floating point numbers in python, or check if, at some point, you lose precision by storing those numbers in a single precision float variable (this is more frequent that normally one estimates) and see if there's some difference (I don't believe there will be) in the formats used by both environments. By the way, showing such differences in your question should be a plus (something you have also not done) as we could tell you if the differences you observe are normal or not.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31