Does pandas have a default fill value when constructing a dataframe of a specific dtype?

Question

Consider the dictionary d:

d = {'A': {'x': 1, 'y': 1}, 'B': {'y': 1, 'z': 1}}

when I pass this to pandas.DataFrame constructor, I know I'll have missing values for row x, column B and row z, column A.

df = pd.DataFrame(d)
df

     A    B
x  1.0  NaN
y  1.0  1.0
z  NaN  1.0

I want to those NaN to be filled in with 0. Of course I know I can fill it in.

df.fillna(0)

But now they are all floats

     A    B
x  1.0  0.0
y  1.0  1.0
z  0.0  1.0

Yes! I could have forced them to integers

df.fillna(0).astype(int)

   A  B
x  1  0
y  1  1
z  0  1

Or! I could have constructed a series with a clever dictionary comprehension and unstacked with a fill_value parameter

pd.Series(
    {(i, j): v for j, d_ in d.items() for i, v in d_.items()}
).unstack(fill_value=0)

But all this would be a ton easier if there were a direct way to fill in missing with a default value from the start. I'd expect something like

pd.DataFrame(d, dtype=int, fill_value=0)

I know that isn't available, but is there something else I've missed?

The dtype here is inferred due to missing values though, as you had to fill the missing values as a post-processing step you would need to cast to `int` to coerce the `dtype` — EdChum, Jan 04 '17 at 09:50
@EdChum Yes! when I did `pd.DataFrame(d)` it had to infer. However, if I specified the dtype in the constructor, it would be handy to be able to specify what to do with missing. — piRSquared, Jan 04 '17 at 09:52
One possible solution is add missing keys in `dict` and set val to `0` - [see here](http://stackoverflow.com/q/33910764/2901002). — jezrael, Jan 04 '17 at 09:53

score 11 · Answer 1 · answered Apr 05 '20 at 12:36

11

Since pandas 0.24 you can use the Int64 dtype:

import pandas as pd    
d = {'A': {'x': 1, 'y': 1}, 'B': {'y': 1, 'z': 1}}    
pd.DataFrame(d, dtype='Int64').fillna(0)

Output:

Be aware of the capital I in 'Int64'. If you write it with lower 'i', i.e. 'int64', you will get floats.

answered Apr 05 '20 at 12:36

above_c_level

My favorite answer of the day :)! For everyone: `Int64` is an `integer` datatype, that can represent `NAN`s! – Markus Dutschke Apr 08 '21 at 15:12
what if we don't know datatypes each time and want to do it for all int64, float64, object etc. ? In my case it fails if we use int64 but there is no column with datatype int64 – VGupta Jun 17 '23 at 13:28

1 Answers1