Fill merged columns with 0 instead of NaN

Question

I have a problem. I want to merge two dataframes, but instead of NaN it should be filled with 0. But only the "new" columns. How could I do that?

What I tried

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left', fill = 0)
[OUT]
TypeError: merge() got an unexpected keyword argument 'fill'

d = {'host_id': [1, 1, 2],
     'id': [10, 11, 20],
     'value': ["Hot Water,Cold Water,Kitchen,Coffee", 
               "Hot Water,Coffee,Something",
               "Hot Water,Coffee"]}
df = pd.DataFrame(data=d)
print(df)


d2 = {'host_id': [1, 1, 2, 3],
     'id': [10, 11, 20, 30],
     'some': ['test1', "test2", "test3", np.nan]}
df2 = pd.DataFrame(data=d2)
print(df2)

df_path = df.copy()
df_path.index = pd.MultiIndex.from_arrays(df_path[['host_id', 'id']].values.T, names=['host_id', 'id'])
df_path = df_path['value'].str.split(',', expand=True)
df_path = df_path.melt(ignore_index=False).dropna()
df_path.reset_index(inplace=True)

one_hot = pd.get_dummies(df_path['value'])
df_one = df_path.drop('value',axis = 1)
df_one = df_path.join(one_hot)

grouped_df_one = df_one.groupby(['id']).max()
grouped_df_one = grouped_df_one.drop(columns=['value', 'variable']).reset_index()

df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left')
df3

   host_id  id                                value
0        1  10  Hot Water,Cold Water,Kitchen,Coffee
1        1  11           Hot Water,Coffee,Something
2        2  20                     Hot Water,Coffee

   host_id  id   some
0        1  10  test1
1        1  11  test2
2        2  20  test3
3        3  30    NaN

What I got

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN     NaN         NaN        NaN      NaN        NaN

What I want

   host_id  id   some  Coffee  Cold Water  Hot Water  Kitchen  Something
0        1  10  test1     1.0         1.0        1.0      1.0        0.0
1        1  11  test2     1.0         0.0        1.0      0.0        1.0
2        2  20  test3     1.0         0.0        1.0      0.0        0.0
3        3  30    NaN       0           0          0        0          0

score 4 · Accepted Answer · answered Nov 17 '21 at 13:30

4

You can fill specific columns using

df[list_cols] = df[list_cols].fillna(0)

where list_cols is e.g.

list_cols = ["Coffee", "Cold Water", "Hot Water", "Kitchen", "Something"]

See: Pandas fill multiple columns with 0 when null

answered Nov 17 '21 at 13:30

emilk

106
8

score 1 · Answer 2 · answered Nov 17 '21 at 13:28

1

Isolate column some and fillna. The code below selects all other columns except some

df.update(df.filter(regex='[^some]', axis=1).fillna(0))



 print(df)

answered Nov 17 '21 at 13:28

wwnde

26,119
6
18
32

score 1 · Answer 3 · answered Nov 17 '21 at 13:33

1

I think you can do something like this for every column where you want to replace NaN values:

columns = ["Hot Water", "Cold Water", "Kitchen", "Coffee", "Something"]
for column in columns:
    df3[column] = df3[column].replace(np.nan, 0)

answered Nov 17 '21 at 13:33

Adrian B

1,490
1
19
31

Tõnis Piip · Answer 4 · 2021-11-17T13:26:31.513

0

Filling NaN or Null values can be solved with pandas.DataFrame.fllna()

If you want only new cells filled and are making new dataframes and merging them as a final step, use fillna() somewhere between or even when merging, but not after.

df = df.fillna(0)

edited Nov 17 '21 at 13:26

answered Nov 17 '21 at 13:23

Tõnis Piip

482
2
9

But `some ` shouldn't not be filled with `0` – Test Nov 17 '21 at 13:25

Fill merged columns with 0 instead of NaN

4 Answers4