Is there a way to load a sql query in a pandas >= 1.0.0 dataframe using Int64 instead of float?

Question

When loading the output of query into a DataFrame using pandas, the standard behavior was to convert integer fields containing NULLs to float so that NULLs would became NaN.

Starting with pandas 1.0.0, they included a new type called pandas.NA to deal with integer columns having NULLs. However, when using pandas.read_sql(), the integer columns are still being transformed in float instead of integer when NULLs are present. Added to that, the read_sql() method doesn't support the dtype parameter to coerce fields, like read_csv().

Is there a way to load integer columns from a query directly into a Int64 dtype instead of first coercing it first to float and then having to manually covert it to Int64?

Vaishali2020 · Answer 1 · 2020-03-18T15:22:59.493

1

Have you tried using select isnull(col_name,0) from table_name. This converts all null values to 0.

Integers are automatically cast to float values just as boolean values are cast to objects when some values are n/a.

edited Mar 18 '20 at 15:22

answered Mar 17 '20 at 23:53

Vaishali2020

39
2

That would only work if there's no difference between a 0 and a NULL. So in many cases, that wouldn't be applicable. For example, let's assume you read a column to a DataFrame using this approach and immediately store it again in another table with to_sql(). The content of the two tables would be different (with 0s in the place of NULLs). What if you need to differentiate between those? – user13079354 Mar 18 '20 at 00:22

score 0 · Answer 2 · answered Mar 18 '20 at 02:48

Seems like that, as of current version, there is no direct way to do that. There is no way to coerce a column to this dtype and pandas won't use the dtype for inference.

There's a similar problem discussed in this thread: Convert Pandas column containing NaNs to dtype `int`

Is there a way to load a sql query in a pandas >= 1.0.0 dataframe using Int64 instead of float?

2 Answers2