8

I have a dataframe with columns of different datatypes including dates. No after doing some modifications, i want to save it a feather file so as to access it later. But i am getting the error on the following step

historical_transactions.to_feather('tmp/historical-raw')

ArrowNotImplementedError: halffloat
kramer
  • 849
  • 2
  • 10
  • 19

4 Answers4

9

I guess, in your dataframe, there is columns of dtype as float16 which is not supported in feather format. you can convert those columns to float32 and try.

Narendra Sahu
  • 136
  • 2
  • 14
4

You could try this:

    historical_transactions.astype('float32').to_feather('tmp/historical-raw')

Note that above line could fail if you also have fields that are not convertable into float32. In order to ignore those columns and leave them as they are, try:

    historical_transactions.astype('float32', errors='ignore').to_feather('tmp/historical-raw')

Feather format depends on Pyarrow which in turn depends on the Apache Parquet format. Regarding float formats, it only supports float (32) and double (64). Not sure how big of a deal this is for you but there is also an open request to automatically "Coerce Arrow half-precision float to float32" in GitHub.

See here and here for details.

Kocas
  • 302
  • 1
  • 12
4

Improving on Kocas' answer, converting exclusively the half-float columns

half_floats = historical_transactions.select_dtypes(include="float16")
historical_transactions[half_floats.columns] = half_floats.astype("float32")
historical_transactions.to_feather('tmp/historical-raw')
Bruno Degomme
  • 883
  • 10
  • 11
1

Another work-around is to view float16 as uint16 on save and view uint16 as float16 on load. E.g.:

import numpy as np


def encode_float16_to_uint16(f16):
    return np.frombuffer(f16.astype(np.float16, copy=False), dtype=np.uint16)


def decode_uint16_to_float16(u16):
    return np.frombuffer(u16.astype(np.uint16, copy=False), dtype=np.float16)

numpy.frombuffer creates a view into the original object, hence, no data copy is involved when viewing float16 array as uint16 and vice versa. This work-around is as cheap as it gets.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271