3

I have an array of 10-bytes (80-bits) Little Endian float values (or float80). How can i read this values in python 3?

The package struct does not support float80 (may be I read the docs carelessly).

The package array as same as package "struct" does not support float80.

The package numpy supports float128 or float96 types. It's very good, but appending \x00 in a tail of float80 to extend it to float96 or float128 is ugly, importing of this package takes a lot of time.

The package ctypes supports c_longdouble. It's many times faster then numpy, but sizeof(c_longdouble) is machine-dependent and can be less then 80 bits, appending \x00 in a tail of float80 to extend it to c_longdouble is ugly too.

UPDATE 1: test code at my gist.github. The function decode_str64 is ugly, but it works. Now I'm looking for right way

kai3341
  • 89
  • 7
  • You should probably change the format of the producer to either produce `float64` or `float96`/`float128`... – Bakuriu Aug 09 '16 at 11:38
  • @Bakuriu, I would have done it if I could :( – kai3341 Aug 09 '16 at 11:42
  • 2
    possibly loading as struct " – janbrohl Aug 09 '16 at 11:50
  • What do you want the output to be? Regular Python `float`s? How should values that are out of range for a regular `double` but within the range of an 80-bit `long double` be handled by the conversion? – Mark Dickinson Aug 09 '16 at 11:50
  • AFAIK either you do that by hand as mentioned by janbrohl, which will be a lot cumbersome, or you just pad the floats to become `float96`/`float128` and use numpy. There is no built-in way to handle float80s. – Bakuriu Aug 09 '16 at 11:53
  • @janbrohl, testing has shown, that solution `struct.Struct(' – kai3341 Aug 09 '16 at 12:32
  • @Mark Dickinson, I want to get in the output python's regular `float`. Real out of range is unlikely. – kai3341 Aug 09 '16 at 12:38
  • @kai3341 I mixed up little and big endian - should have been `" – janbrohl Aug 09 '16 at 13:25
  • @janbrohl, ok, `STRUCT10 = struct.unpack('<' + 48*'QH', BINSTR)`, where `BINSTR` contains 480 bytes. So,`len(STRUCT10)` is `96` instead of `48`, values does not match with test. It seems wrong way – kai3341 Aug 09 '16 at 13:28
  • that might be due to [alignment](https://docs.python.org/3/library/struct.html#struct-alignment) or padding issues - both are dontknowhat-dependent – janbrohl Aug 09 '16 at 13:43
  • @kai3341: You're getting 96 values because you're getting the fractions and exponents interleaved. You need to read those 96 values in (fraction, exponent) pairs, and process each pair to get a floating-point value. – Mark Dickinson Aug 09 '16 at 14:17

3 Answers3

3

Let me rewrite my answer in a more logical way:

ctypes c_longdouble is machine dependent because the longdouble float type is not set in stone by the C standard and is dependent on the compiler :( but it is still your best you can have right now for high precision floats...

If you plan to use numpy, numpy.longdouble is what your are looking for, numpy.float96 or numpy.float128 are highly misleading names. They do not indicate a 96- or 128-bit IEEE floating point format. Instead, they indicate the number of bits of alignment used by the underlying long double type. So e.g. on x86-32, long double is 80 bits, but gets padded up to 96 bits to maintain 32-bit alignment, and numpy calls this float96. On x86-64, long double is again the identical 80 bit type, but now it gets padded up to 128 bits to maintain 64-bit alignment, and numpy calls this float128. There's no extra precision, just extra padding.

Appending \x00 at the end of a float80 to make a Float96 is ugly, but in the end it is just that as float96 is just a padded float80 and numpy.longdouble is a float96 or float128 depending of the architecture of the machine you use.

What is the internal precision of numpy.float128?

Community
  • 1
  • 1
Cabu
  • 514
  • 2
  • 5
  • 15
  • The trouble is that on some OSs (notably Windows), `np.longdouble` is simply `np.float64` again, so using `np.longdouble` doesn't give a cross-platform solution. – Mark Dickinson Aug 09 '16 at 13:04
  • @Mark Dickinson, As same as `ctypes.c_longdouble`: on Windows i read a garbage (it may be because `ctypes.sizeof(ctypes.c_longdouble)` less then 10, this check i added a long time after tests on Windows). But for me it is not critical problem now – kai3341 Aug 09 '16 at 13:13
0

numpy can use 80-bit float if the compiler and platform support them:

Whether [supporting higher precision] is possible in numpy depends on the hardware and on the development environment: specifically, x86 machines provide hardware floating-point with 80-bit precision, and while most C compilers provide this as their long double type, MSVC (standard for Windows builds) makes long double identical to double (64 bits). Numpy makes the compiler’s long double available as np.longdouble (and np.clongdouble for the complex numbers). You can find out what your numpy provides withnp.finfo(np.longdouble).

I checked that np.longdouble is float64 in stock numpy-1.11.1-win32.whl at PyPI as well as in Gohlke's build and float96 in numpy-1.4.1-9.el6.i686 in CentOS 6.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • Yes, `numpy` provides `float64`, `float96`, `float128` data types. – kai3341 Sep 30 '16 at 13:28
  • So it's good, data can be read by appending \x00 in a tail byte array. First I done this task using `numpy`. By the reason of long time importing `numpy` I changed `numpy` to `ctypes`. There is no difference: I appending \x00 in a tail of byte array. – kai3341 Sep 30 '16 at 13:35
0

The padding, or rather, memory alignment of extended-precision floats on a 4 (x32) or 16 (x64) byte boundary, is added - by recommendatation from Intel no less - to avoid a performance hit associated with handling non-aligned data on x86 CPUs. To give you an idea of the hit's magnitude, some figures from Microsoft show ~2 times difference for DWORDs.

This layout is ingrained into the underlying C's long double rather than being numpy's invention, so numpy doesn't attempt to provide any way around it to extract/insert only the "significant" part.

So, adding padding by hand if you have raw data without padding looks like the way to go. You can speed up the process by writing directly to the underlying buffer:

fi=np.finfo(np.longdouble)
assert fi.nmant==63 and fi.nexp==15, "80-bit float support is required"
del fi

len_float80=10    #no way to extract this from dtype/finfo
len_padded=np.dtype(np.longdouble).itemsize

f=open('float80.bin','rb')
f_items=os.stat(f.name).st_size//len_float80

n = np.empty(f_items,dtype=np.longdouble)

for i in xrange(f_items):
    raw=f.read(len_float80)
    n.data[i*len_padded:i*len_padded+len_float80]=raw

del f,i,raw,f_items

Or even attain much more speedup by porting the code to Cython (if using raw buffers, the speedup compared to regular array indexing can be as much as 100x! This would hurt the code's maintainability though so beware of premature optimization here).

Alternatively, for an "interchange" format, you might consider using one that is not bound to internal representation, like savetxt.

Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152