10

Consider the dictionary d:

d = {'A': {'x': 1, 'y': 1}, 'B': {'y': 1, 'z': 1}}

when I pass this to pandas.DataFrame constructor, I know I'll have missing values for row x, column B and row z, column A.

df = pd.DataFrame(d)
df

     A    B
x  1.0  NaN
y  1.0  1.0
z  NaN  1.0

I want to those NaN to be filled in with 0. Of course I know I can fill it in.

df.fillna(0)

But now they are all floats

     A    B
x  1.0  0.0
y  1.0  1.0
z  0.0  1.0

Yes! I could have forced them to integers

df.fillna(0).astype(int)

   A  B
x  1  0
y  1  1
z  0  1

Or! I could have constructed a series with a clever dictionary comprehension and unstacked with a fill_value parameter

pd.Series(
    {(i, j): v for j, d_ in d.items() for i, v in d_.items()}
).unstack(fill_value=0)

But all this would be a ton easier if there were a direct way to fill in missing with a default value from the start. I'd expect something like

pd.DataFrame(d, dtype=int, fill_value=0)

I know that isn't available, but is there something else I've missed?

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    The dtype here is inferred due to missing values though, as you had to fill the missing values as a post-processing step you would need to cast to `int` to coerce the `dtype` – EdChum Jan 04 '17 at 09:50
  • @EdChum Yes! when I did `pd.DataFrame(d)` it had to infer. However, if I specified the dtype in the constructor, it would be handy to be able to specify what to do with missing. – piRSquared Jan 04 '17 at 09:52
  • 1
    One possible solution is add missing keys in `dict` and set val to `0` - [see here](http://stackoverflow.com/q/33910764/2901002). – jezrael Jan 04 '17 at 09:53

1 Answers1

11

Since pandas 0.24 you can use the Int64 dtype:

import pandas as pd    
d = {'A': {'x': 1, 'y': 1}, 'B': {'y': 1, 'z': 1}}    
pd.DataFrame(d, dtype='Int64').fillna(0)

Output:

    A   B
x   1   0
y   1   1
z   0   1

Be aware of the capital I in 'Int64'. If you write it with lower 'i', i.e. 'int64', you will get floats.

above_c_level
  • 3,579
  • 3
  • 22
  • 37
  • My favorite answer of the day :)! For everyone: `Int64` is an `integer` datatype, that can represent `NAN`s! – Markus Dutschke Apr 08 '21 at 15:12
  • what if we don't know datatypes each time and want to do it for all int64, float64, object etc. ? In my case it fails if we use int64 but there is no column with datatype int64 – VGupta Jun 17 '23 at 13:28