Pandas: fill in NaN values with values from dict of dicts

Question

This question was inspired by this other one.

Say that I have the following pandas dataframe:

   TYPE  YEAR  DAY  VALUE
0  a     2004  10   NaN
1  b     2005  12   NaN
2  c     2006  180  NaN
3  a     2007  127  NaN
4  b     2008  221  NaN
5  c     2008  17   NaN

and that I have to fill in the VALUE column based on the following dict of dicts, which has the format {YEAR: {DAY, VALUE}}:

mydict={2004: {10: 7.1},
        2005: {12: 9.19},
        2006: {127: 16.04, 180: 12.33},
        2007: {55: 21.94, 127: 33.11},
        2008: {17: 5.13, 221: 19.17, 300: 10.05}}

The answer given in the post above is to use df.VALUE = df.VALUE.fillna(df.YEAR.map(mydict)).

How can I change this mapping to make sure it "follows" both the YEAR and DAY columns in my dataframe?

If I apply the snippet above I get of course:

   TYPE  YEAR  DAY  VALUE
0  a     2004  10   {10: 7.1}
1  b     2005  12   {12: 9.19}
2  c     2006  180  {127: 16.04, 180: 12.33}
3  a     2007  127  {55: 21.94, 127: 33.11}
4  b     2008  221  {17: 5.13, 221: 19.17, 300: 10.05}
5  c     2008  17   {17: 5.13, 221: 19.17, 300: 10.05}

Instead, I am aiming for the values.

zipa · Accepted Answer · 2017-09-25T14:48:23.733

3

You can rewrite that column using assign:

df['VALUE'] = df.apply(lambda x: mydict[x.YEAR][x.DAY], axis=1)

Or as @Maarten Fabré noticed:

df['VALUE'] = df.apply(lambda x: mydict[x.YEAR].get(x.DAY, np.nan), axis=1)

edited Sep 25 '17 at 14:48

answered Sep 25 '17 at 14:43

zipa

27,316
6
40
58

Does this ignore `KeyError`? If not, you can better use `dict.get(key, default)` to prevent those – Maarten Fabré Sep 25 '17 at 14:45
I added one `get()`, but surely could've gone with two :) – zipa Sep 25 '17 at 14:48

BENY · Answer 2 · 2017-09-25T15:06:06.323

2

df1=pd.DataFrame(mydict).stack().to_frame()
df.assign(VALUE=df.set_index(['DAY', 'YEAR']).VALUE.fillna(df1[0]).values)
Out[937]: 
  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13

edited Sep 25 '17 at 15:06

answered Sep 25 '17 at 14:44

BENY

317,841
20
164
234

score 2 · Answer 3 · answered Sep 25 '17 at 14:56

Option 1
Use pd.DataFrame.lookup

df.assign(VALUE=pd.DataFrame(mydict).lookup(df.DAY, df.YEAR))

  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13

Option 2
comprehension + zip

df.assign(VALUE=[mydict[y][d] for y, d in zip(df.YEAR, df.DAY)])

  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13

Maarten Fabré · Answer 4 · 2017-09-25T14:51:34.163

First get the info from my_dict into a series with the year and day as index

df2 = pd.DataFrame.from_dict(mydict).transpose().stack(0)
# df2 = pd.DataFrame(mydict).unstack().dropna() # works too

Then make Year and Day index for the original df, insert the sacond index, and transfrom the result back to the original shape

df3 = df.set_index(['DAY', 'YEAR'])
df3['VALUE'] = df2
df3.reset_index().reindex(columns=df.columns)

Pandas: fill in NaN values with values from dict of dicts

4 Answers4