Pandas fillna multiple columns with values from corresponding columns without repeating for each

Question

Let's say I have a DataFrame like this:

x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,                                        
                  'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan], 
                  'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
                  'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})

And I want to fill the NaN values in col_x with the values in col_y respectively, I can do this:

x['col1_x'] = x['col1_x'].fillna(x['col1_y'])
x['col2_x'] = x['col2_x'].fillna(x['col2_y'])

print(x)

Which will yield:

   col1_x  col2_x  col1_y  col2_y
0    15.0    93.0      10      93
1    20.0    24.0      20      24
2   136.0    51.0      30      52
3    93.0    22.0      40     246
4   743.0    38.0      50     142
5    60.0    53.0      60      53
6    70.0    72.0      70      94
7    91.0     2.0      80       2

But requires to repeat the same function with different variables, now let's assume that I have a bigger DataFrame with much more columns, is it possible to do it without repeating?

score 2 · Answer 1 · answered Jul 31 '21 at 11:25

2

You can use the following notation -

x.fillna({"col1_x": x["col1_y"], "col2_x": x["col2_y"]})

Assuming you can extract all the indices numbers you can do -

replace_dict = {f"col{item}_x":x[f"col{item}_y"] for item in indices}
x = x.fillna(replace_dict}

answered Jul 31 '21 at 11:25

Tom Ron

5,906
3
22
38

1

``x.fillna({"col1_x": x["col1_y"], "col2_x": x["col2_y"]})`` Honestly that's exactly what I was looking for but I guess I did it wrong, Thanks for the input! I'd accept this as an answer too if I could! – Tomer Poliakov Jul 31 '21 at 12:12

imxitiz · Answer 2 · 2021-07-31T12:29:25.267

Are you trying to make this type of function :

def fil(fill,fromm):
    fill.fillna(fromm,inplace=True)

fil(x['col1_x'],x['col1_y'])

Or if you are sure about dataframe(x) then this :

def fil(fill,fromm):
    x[fill].fillna(x[fromm],inplace=True)

fil('col1_x','col1_y')

For your code :

import pandas as pd
import numpy as np

x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,                                
                  'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan], 
                  'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
                  'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})


def fil(fill,fromm):
    x[fill].fillna(x[fromm],inplace=True)

fil('col1_x','col1_y')
fil('col2_x','col2_y')

print(x)

"""
   col1_x  col2_x  col1_y  col2_y
0    15.0    93.0      10      93
1    20.0    24.0      20      24
2   136.0    51.0      30      52
3    93.0    22.0      40     246
4   743.0    38.0      50     142
5    60.0    53.0      60      53
6    70.0    72.0      70      94
7    91.0     2.0      80       2
"""

Additionally, if you have column name like col1_x,col2_x,col3_x.... same for y then you may automate it like this :

for i in range(1,3):
    fil(f'col{i}_x',f'col{i}_y')

Thanks! that's certainly a one way to do it, can't believe I didn't think of it tho — Tomer Poliakov, Jul 31 '21 at 12:05
@TomerPoliakov This is like a universal solution, cuz accepted answer, only works when column name is specific but my answer works for all name of column. Anyway for you that works. :) — imxitiz, Jul 31 '21 at 12:15
actually quite the opposite, with your solution I have to specify the names of every column and since I have quite a bunch of them, it will look ugly, while with the answer I have accepted I only have to specify the ending which is the default ending of ``merge`` module. tho, I can't not agree that your solution is much easier to understand compared to the whatever it is I accepted. I appreciate your Input! Thanks! — Tomer Poliakov, Jul 31 '21 at 12:52

Rob Raymond · Accepted Answer · 2021-07-31T12:44:11.073

you can use **kwargs to assign()
build up a dict with a comprehension to build **kwargs

import pandas as pd
import numpy as np
x = pd.DataFrame({'col1_x': [15, np.nan, 136, 93, 743, np.nan, np.nan, 91] ,                                        
                  'col2_x': [np.nan, np.nan, 51, 22, 38, np.nan, 72, np.nan], 
                  'col1_y': [10, 20, 30, 40, 50, 60, 70, 80],
                  'col2_y': [93, 24, 52, 246, 142, 53, 94, 2]})

x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c})

	col1_x	col2_x	col1_y	col2_y
0	15	93	10	93
1	20	24	20	24
2	136	51	30	52
3	93	22	40	246
4	743	38	50	142
5	60	53	60	53
6	70	72	70	94
7	91	2	80	2

How does it work

# core - loop through columns that end with _x and generate it's pair column _y
{c:c.replace("_x","_y") 
 for c in x.columns if "_x" in c}

# now we have all the pairs of a columns let's do what we want - fillna()
{c:x[c].fillna(x[c.replace("_x","_y")]) for c in x.columns if "_x" in c}

# this dictionary matches this function.... https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
# so final part is call the function with **kwargs
x.assign(**{c:x[c].fillna(x[c.replace("_x","_y")]) 
            for c in x.columns if "_x" in c})

I'm breaking my head right now trying to understand what it all means, some of it I get but most of it I don't. Colud you please briefly explain what it does? — Tomer Poliakov, Jul 31 '21 at 11:51
I have added a "how does it work" section to answer. there is lot's of information on python **kwargs .... maybe this helps https://stackoverflow.com/questions/1769403/what-is-the-purpose-and-use-of-kwargs — Rob Raymond, Jul 31 '21 at 12:47

	col1_x	col2_x	col1_y	col2_y
0	15	93	10	93
1	20	24	20	24
2	136	51	30	52
3	93	22	40	246
4	743	38	50	142
5	60	53	60	53
6	70	72	70	94
7	91	2	80	2

	col1_x	col2_x	col1_y	col2_y
0	15	93	10	93
1	20	24	20	24
2	136	51	30	52
3	93	22	40	246
4	743	38	50	142
5	60	53	60	53
6	70	72	70	94
7	91	2	80	2

Pandas fillna multiple columns with values from corresponding columns without repeating for each

3 Answers3

How does it work

	col1_x	col2_x	col1_y	col2_y
0	15	93	10	93
1	20	24	20	24
2	136	51	30	52
3	93	22	40	246
4	743	38	50	142
5	60	53	60	53
6	70	72	70	94
7	91	2	80	2