2

When loading the output of query into a DataFrame using pandas, the standard behavior was to convert integer fields containing NULLs to float so that NULLs would became NaN.

Starting with pandas 1.0.0, they included a new type called pandas.NA to deal with integer columns having NULLs. However, when using pandas.read_sql(), the integer columns are still being transformed in float instead of integer when NULLs are present. Added to that, the read_sql() method doesn't support the dtype parameter to coerce fields, like read_csv().

Is there a way to load integer columns from a query directly into a Int64 dtype instead of first coercing it first to float and then having to manually covert it to Int64?

user13079354
  • 181
  • 8

2 Answers2

1

Have you tried using select isnull(col_name,0) from table_name. This converts all null values to 0.

Integers are automatically cast to float values just as boolean values are cast to objects when some values are n/a.

  • That would only work if there's no difference between a 0 and a NULL. So in many cases, that wouldn't be applicable. For example, let's assume you read a column to a DataFrame using this approach and immediately store it again in another table with to_sql(). The content of the two tables would be different (with 0s in the place of NULLs). What if you need to differentiate between those? – user13079354 Mar 18 '20 at 00:22
0

Seems like that, as of current version, there is no direct way to do that. There is no way to coerce a column to this dtype and pandas won't use the dtype for inference.

There's a similar problem discussed in this thread: Convert Pandas column containing NaNs to dtype `int`

user13079354
  • 181
  • 8