1

I have a DataFrame which looks like that:

                                     Something1                        Something2 
                              date  2020-03-30  2020-03-31  2020-04-01  2020-03-30  2020-03-31  2020-04-01  
index_1 index_2  index_3 index_4                                                                                    
A0        B0       C0        D0         10         NaN         11         'bla'          'bli'    'blo'
A1        B1       C1        D1         8          NaN         NaN        'bla1'         'bli1    'blo1'
A2        B2       C2        D0         0          NaN         303        'bla2'         'bli2'   'blo2'

When the index_[1|2|3|4] are a MultiIndex. Something[1|2] are first level columns. and the dates are the a repetitive second level columns which are same across the Something[1|2] first level columns.

Now, I got a different table which contains values to I want to enrich with my current table. The new table looks like:

index_4      date          val  
   D0      2020-03-30      8
   D0      2020-03-31      9
   ...
   D1      2020-03-30      17
   D1      2020-03-31      33

I want to enrich the old table using the old one.

My Questions is: How do I fill the old table using the new table - when I only want to set the value in Something1, so it would look like (after the filling of D0 in the date 2020-03-31 in Something1):

                                     Something1                        Something2 
                              date  2020-03-30  2020-03-31  2020-04-01  2020-03-30  2020-03-31  2020-04-01  
index_1 index_2  index_3 index_4                                                                                    
A0        B0       C0        D0         10         9           11       'bla'          'bli'      'blo'
A1        B1       C1        D1         8          NaN         NaN      'bla1'         'bli1'     'blo1'
A2        B2       C2        D0         0          9           303      'bla2'         'bli2'     'blo2'

I've tried to set the value using set_value, xs, at but I can't find the correct combo to get to the correct cell. (also looked in this post for some ideas but without successes )
I guess it suppuse to look like something like that

df.at[index_4='D0']['Something1']['2020-03-31'] = new_df['D0', '2020-03-31']['val']
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Green
  • 2,405
  • 3
  • 22
  • 46
  • 1
    If you apply all the new table on the cells you are of course correct. I've tried to show an example of assigning of a specific one. – Green Sep 21 '20 at 19:25

2 Answers2

1

One of the ways I can think of is to reshape the data so as we can have index_4 and date as index, map the values, and reshape back:

df2 = df2.set_index(['date','index_4'])

s = df['S1'].unstack(level=-1).T
df['S1'] = s.apply(lambda x: x.fillna(df2['val'])).T.stack(level=-1)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1
  • In order to update the DataFrame with a vectorized implementation, the column and row indices between the two dataframes must match.
  • Given df and df2, as shown in the question. Reproducible dataframes have been setup at the bottom of the question.
  • This implementation avoids using .apply, and instead uses pandas.DataFrame.update
  1. Add a column level to df2 so it matches the column levels of df
  2. Reset the index of df to only contain the index that is in df2
  3. Update df from df2, and reset the index as desired.
import pandas

# set the column index of df2 to have a matching level 0
df2.columns = pd.MultiIndex.from_product([['S1'], df2.columns])

# display(df2)
                         S1           
      2020-03-30 2020-03-31
idx4                       
D0             8          9
D1            17         33

# reset the index of dfg so only idx4 is in the index
df = df.reset_index(level=[0, 1, 2])

# update the df from df2
df.update(df2, overwrite=False)
df = df.reset_index().set_index(['idx1', 'idx2', 'idx3', 'idx4'], append=True)

# display(df)
                                                    S1                               S2                      
                      2020-03-30 2020-03-31 2020-04-01 2020-03-30 2020-03-31 2020-04-01
  idx1 idx2 idx3 idx4                                                                  
0 A0   B0   C0   D0           10          9         11        bla        bli        blo
1 A1   B1   C1   D1            8         33        NaN       bla1       bli1       blo1
2 A2   B2   C2   D0            0          9        303       bla2       bli2       blo2

Reproducible df2 setup

df2 = pd.DataFrame.from_dict({'D0': {'2020-03-30': 8, '2020-03-31': 9}, 'D1': {'2020-03-30': 17, '2020-03-31': 33}}, 'index')
df2.index.names = ['idx4']

# display(df2)
       2020-03-30  2020-03-31
idx4                         
D0              8           9
D1             17          33

Reproducible df setup

data = {('A0', 'B0', 'C0', 'D0'): {('S1', '2020-03-30'): '10', ('S1', '2020-03-31'): pd.NA, ('S1', '2020-04-01'): '11', ('S2', '2020-03-30'): 'bla', ('S2', '2020-03-31'): 'bli', ('S2', '2020-04-01'): 'blo'},
        ('A1', 'B1', 'C1', 'D1'): {('S1', '2020-03-30'): '8', ('S1', '2020-03-31'): pd.NA, ('S1', '2020-04-01'): pd.NA, ('S2', '2020-03-30'): 'bla1', ('S2', '2020-03-31'): 'bli1', ('S2', '2020-04-01'): 'blo1'},
        ('A2', 'B2', 'C2', 'D0'): {('S1', '2020-03-30'): '0', ('S1', '2020-03-31'): pd.NA, ('S1', '2020-04-01'): '303', ('S2', '2020-03-30'): 'bla2', ('S2', '2020-03-31'): 'bli2', ('S2', '2020-04-01'): 'blo2'}}

df = pd.DataFrame.from_dict(data, 'index')
df.index.names = ['idx1', 'idx2', 'idx3', 'idx4']

                                                  S1                               S2                      
                    2020-03-30 2020-03-31 2020-04-01 2020-03-30 2020-03-31 2020-04-01
idx1 idx2 idx3 idx4                                                                  
A0   B0   C0   D0           10       <NA>         11        bla        bli        blo
A1   B1   C1   D1            8       <NA>       <NA>       bla1       bli1       blo1
A2   B2   C2   D0            0       <NA>        303       bla2       bli2       blo2
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
  • 1
    I had to first pivot df2 to get it to look like your example, but it was your explanation was clear enough for me to understate the core idea. Thanks for the effort! – Green Sep 22 '20 at 14:57