4

I have a pandas.DataFrame containing pandas nullable integer data type and want to convert it to an equivalent datatable.Frame object. However it seems it is not directly possible. What is the best way of doing the conversion without breaking stuff? I do not have the DataFrames available in text form, but they com e from a pickle. MWE:

import numpy as np, pandas as pd, datatable as dt
vec = [0,1,2, np.nan]
df = pd.DataFrame(vec, dtype="Int32")
frame = dt.Frame(vec, stypes=["int32"])   # works fine
dt.Frame(df)   # raises error
# TypeError: Cannot create a column from <class 'pandas.core.arrays.integer.IntegerArray'>
Hyperplane
  • 1,422
  • 1
  • 14
  • 28
  • Not an answer to your question, but consider another alias for `datatable`, since `dt` is very commonly used as alias for the `datetime` package – oskros Nov 19 '20 at 10:45
  • @oskros the `datatable` documentation uses `dt` as the default, so I think it makes sense to stick to that in the context of this question. – Hyperplane Nov 19 '20 at 10:48
  • I think you should raise it as an issue on [github](https://github.com/h2oai/datatable/issues), so that the maintainers can work on it. At the moment, it does look like ``datatable`` is not recognizing pandas nullable integer data type – sammywemmy Nov 19 '20 at 20:04
  • FR: https://github.com/h2oai/datatable/issues/2761 – jangorecki Dec 15 '20 at 11:12

1 Answers1

0

The fastest way I found was... first converting to numpy and then to Datatable (and then just add the column names), to me it preserved the data format:

import datatable as dt
import pandas as pd

x = pandas_dataframe.to_numpy()
y = dt.Frame(x)

y.stypes #verify the format

Hope it helps!