1

This question was inspired by this other one.

Say that I have the following pandas dataframe:

   TYPE  YEAR  DAY  VALUE
0  a     2004  10   NaN
1  b     2005  12   NaN
2  c     2006  180  NaN
3  a     2007  127  NaN
4  b     2008  221  NaN
5  c     2008  17   NaN

and that I have to fill in the VALUE column based on the following dict of dicts, which has the format {YEAR: {DAY, VALUE}}:

mydict={2004: {10: 7.1},
        2005: {12: 9.19},
        2006: {127: 16.04, 180: 12.33},
        2007: {55: 21.94, 127: 33.11},
        2008: {17: 5.13, 221: 19.17, 300: 10.05}}

The answer given in the post above is to use df.VALUE = df.VALUE.fillna(df.YEAR.map(mydict)).

How can I change this mapping to make sure it "follows" both the YEAR and DAY columns in my dataframe?

If I apply the snippet above I get of course:

   TYPE  YEAR  DAY  VALUE
0  a     2004  10   {10: 7.1}
1  b     2005  12   {12: 9.19}
2  c     2006  180  {127: 16.04, 180: 12.33}
3  a     2007  127  {55: 21.94, 127: 33.11}
4  b     2008  221  {17: 5.13, 221: 19.17, 300: 10.05}
5  c     2008  17   {17: 5.13, 221: 19.17, 300: 10.05}

Instead, I am aiming for the values.

cs95
  • 379,657
  • 97
  • 704
  • 746
FaCoffee
  • 7,609
  • 28
  • 99
  • 174

4 Answers4

3

You can rewrite that column using assign:

df['VALUE'] = df.apply(lambda x: mydict[x.YEAR][x.DAY], axis=1)

Or as @Maarten Fabré noticed:

df['VALUE'] = df.apply(lambda x: mydict[x.YEAR].get(x.DAY, np.nan), axis=1)
zipa
  • 27,316
  • 6
  • 40
  • 58
2
df1=pd.DataFrame(mydict).stack().to_frame()
df.assign(VALUE=df.set_index(['DAY', 'YEAR']).VALUE.fillna(df1[0]).values)
Out[937]: 
  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13
BENY
  • 317,841
  • 20
  • 164
  • 234
2

Option 1
Use pd.DataFrame.lookup

df.assign(VALUE=pd.DataFrame(mydict).lookup(df.DAY, df.YEAR))

  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13

Option 2
comprehension + zip

df.assign(VALUE=[mydict[y][d] for y, d in zip(df.YEAR, df.DAY)])

  TYPE  YEAR  DAY  VALUE
0    a  2004   10   7.10
1    b  2005   12   9.19
2    c  2006  180  12.33
3    a  2007  127  33.11
4    b  2008  221  19.17
5    c  2008   17   5.13
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

First get the info from my_dict into a series with the year and day as index

df2 = pd.DataFrame.from_dict(mydict).transpose().stack(0)
# df2 = pd.DataFrame(mydict).unstack().dropna() # works too

Then make Year and Day index for the original df, insert the sacond index, and transfrom the result back to the original shape

df3 = df.set_index(['DAY', 'YEAR'])
df3['VALUE'] = df2
df3.reset_index().reindex(columns=df.columns)
Maarten Fabré
  • 6,938
  • 1
  • 17
  • 36