fill NaN with another lookup table

Question

Is there a way to fill the NaN with value for test=default by matching name, reticle and cell rev?

with the few variables in "test" column:

Is there a way to update the values from others row ? as the datatype "do" would have higher precedence than int and drop the "do" data row?

data:
test datatype name value reticle cell_rev
default int s 0x45 CR1
default int s 0xCB CR3
default do s 0.68 CR1

I'd like to get:

test datatype name value reticle cell_rev
default int s 0.68 CR1
default int s 0xCB CR3

You should include examples as text, not screenshots. – IanS Sep 21 '16 at 12:49 — IanS, Sep 21 '16 at 12:49

jezrael · Accepted Answer · 2016-09-21T16:11:19.603

2

You can use set_index with unstack for reshaping, then ffill for add missing values and last reshape to original by stack:

df = df.set_index(['name','value_old','reticle','test','cell_rev'])
       .unstack()
       .ffill()
       .stack()
       .reset_index()

print (df)
  name value_old reticle     test cell_rev value_new
0    s      0x8E     A28  default      CR1      0x8C
1    s      0x8E     A28  default      CR3      0x8E
2    s      0x8E     A28     etlc      CR1      0x8C
3    s      0x8E     A28     etlc      CR3      0x8E

EDIT by comment:

Use merge by subset df1 created by boolean indexing and then fill NaN values by combine_first or fillna:

df1 = df.ix[df.test == 'default']
print (df1)     
      test name value_old reticle cell_rev value_new
0  default    s      0x8E     A28      CR1      0x8E
1  default    s      0x8E     A28      CR3      0x8C

df2 = pd.merge(df, df1, how='left', on=['name','reticle','cell_rev'], suffixes=('','1'))
print (df2)
      test name value_old reticle cell_rev value_new    test1 value_old1  \
0  default    s      0x8E     A28      CR1      0x8E  default       0x8E   
1  default    s      0x8E     A28      CR3      0x8C  default       0x8E   
2     etlc    s      0x8E     A28      CR1      0x44  default       0x8E   
3     etlc    s      0x8E     A28      CR3      0x44  default       0x8E   
4      mlc    s      0x1E     A28      CR1       NaN  default       0x8E   
5      mlc    s      0x1E     A28      CR3       NaN  default       0x8E   
6      slc    s      0x2E     A28      CR1       NaN  default       0x8E   
7      slc    s      0x2E     A28      CR3       NaN  default       0x8E   

  value_new1  
0       0x8E  
1       0x8C  
2       0x8E  
3       0x8C  
4       0x8E  
5       0x8C  
6       0x8E  
7       0x8C

df['value_new'] = df2['value_new'].combine_first(df2['value_new1'])
#df['value_new'] = df2['value_new'].fillna(df2['value_new1'])
print (df)
      test name value_old reticle cell_rev value_new
0  default    s      0x8E     A28      CR1      0x8E
1  default    s      0x8E     A28      CR3      0x8C
2     etlc    s      0x8E     A28      CR1      0x44
3     etlc    s      0x8E     A28      CR3      0x44
4      mlc    s      0x1E     A28      CR1      0x8E
5      mlc    s      0x1E     A28      CR3      0x8C
6      slc    s      0x2E     A28      CR1      0x8E
7      slc    s      0x2E     A28      CR3      0x8C

edited Sep 21 '16 at 16:11

answered Sep 21 '16 at 11:31

jezrael

822,522
95
1,334
1,252

This solution work nice with sample, but in real data are in column `test` only values `default` and `etlc` ? – jezrael Sep 21 '16 at 12:09
thank you for looking into it, nope, in real data, test column has "default, etlc, Jlc and klc", for jlc and klc, if the value_new is NAN , the NAN should be filled by value from default and matching with others columns (name, reticle, cell, rev) – user37970 Sep 21 '16 at 14:35
Yes, I have same output. I think if it works with your real data well. sorry. – jezrael Sep 21 '16 at 15:16
it take the last observed value instead of default value, I've added the results to the question, I was unable to attach as text, so attach as image. Any idea on getting always getting the default values? I'm really appreciated your help on this! – user37970 Sep 21 '16 at 15:27
also, it is possible the value_old column has "NAN" value, if this happen, then I will expect the value_old will take default values. – user37970 Sep 21 '16 at 15:29
Ok, please give me a time. – jezrael Sep 21 '16 at 15:31
Thank you! it's work like a charm, and thanks for letting me know the combine first function. – user37970 Sep 22 '16 at 06:40
Yes, I was afraid about answering first, because I feel it is more complicated ;) – jezrael Sep 22 '16 at 06:41
and small advice for another questions - you can check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – jezrael Sep 22 '16 at 06:42
@jezreal, yesterday I have spent whole day to find a solution for that question, I should ask in here at first place. And I have some others questions, is it possible to overwrite default value with the values from others row as show in above? – user37970 Sep 22 '16 at 06:53
I think better is create new question as edited old one. Then you can get multiple good answers. thank you. – jezrael Sep 22 '16 at 07:04

score 0 · Answer 2 · answered Sep 21 '16 at 15:37

for i in range(len(df)):
    if df.loc[i, 'value_new'] != df.loc[i, 'value_new']:
        df.loc[i, 'value_new'] = df.loc[(df.test == 'default') &
                                        (df.name == df.loc[i, 'name']) &
                                        (df.reticle == df.loc[i, 'reticle']) &
                                        (df.cell_rev == df.loc[i, 'cell_rev']),
                                        'value_new']

I think there's a more efficient solution, but this should work.

fill NaN with another lookup table

2 Answers2