ArrowNotImplementedError: halffloat error on applying pandas.to_feather on a dataframe

Question

I have a dataframe with columns of different datatypes including dates. No after doing some modifications, i want to save it a feather file so as to access it later. But i am getting the error on the following step

historical_transactions.to_feather('tmp/historical-raw')

ArrowNotImplementedError: halffloat

score 9 · Accepted Answer · answered Apr 03 '19 at 16:32

9

I guess, in your dataframe, there is columns of dtype as float16 which is not supported in feather format. you can convert those columns to float32 and try.

answered Apr 03 '19 at 16:32

Narendra Sahu

136
2
14

score 4 · Answer 2 · answered Sep 10 '19 at 12:44

You could try this:

    historical_transactions.astype('float32').to_feather('tmp/historical-raw')

Note that above line could fail if you also have fields that are not convertable into float32. In order to ignore those columns and leave them as they are, try:

    historical_transactions.astype('float32', errors='ignore').to_feather('tmp/historical-raw')

Feather format depends on Pyarrow which in turn depends on the Apache Parquet format. Regarding float formats, it only supports float (32) and double (64). Not sure how big of a deal this is for you but there is also an open request to automatically "Coerce Arrow half-precision float to float32" in GitHub.

See here and here for details.

Feather and Apache Parquet are two distinct formats. – Micah Kornfield Sep 25 '19 at 05:21 — Micah Kornfield, Sep 25 '19 at 05:21

score 4 · Answer 3 · answered Feb 04 '22 at 09:27

Improving on Kocas' answer, converting exclusively the half-float columns

half_floats = historical_transactions.select_dtypes(include="float16")
historical_transactions[half_floats.columns] = half_floats.astype("float32")
historical_transactions.to_feather('tmp/historical-raw')

Maxim Egorushkin · Answer 4 · 2023-04-18T23:16:01.007

Another work-around is to view float16 as uint16 on save and view uint16 as float16 on load. E.g.:

import numpy as np


def encode_float16_to_uint16(f16):
    return np.frombuffer(f16.astype(np.float16, copy=False), dtype=np.uint16)


def decode_uint16_to_float16(u16):
    return np.frombuffer(u16.astype(np.uint16, copy=False), dtype=np.float16)

numpy.frombuffer creates a view into the original object, hence, no data copy is involved when viewing float16 array as uint16 and vice versa. This work-around is as cheap as it gets.

ArrowNotImplementedError: halffloat error on applying pandas.to_feather on a dataframe

4 Answers4