I have a problem. I want to merge two dataframes, but instead of NaN
it should be filled with 0
. But only the "new" columns. How could I do that?
What I tried
df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left', fill = 0)
[OUT]
TypeError: merge() got an unexpected keyword argument 'fill'
d = {'host_id': [1, 1, 2],
'id': [10, 11, 20],
'value': ["Hot Water,Cold Water,Kitchen,Coffee",
"Hot Water,Coffee,Something",
"Hot Water,Coffee"]}
df = pd.DataFrame(data=d)
print(df)
d2 = {'host_id': [1, 1, 2, 3],
'id': [10, 11, 20, 30],
'some': ['test1', "test2", "test3", np.nan]}
df2 = pd.DataFrame(data=d2)
print(df2)
df_path = df.copy()
df_path.index = pd.MultiIndex.from_arrays(df_path[['host_id', 'id']].values.T, names=['host_id', 'id'])
df_path = df_path['value'].str.split(',', expand=True)
df_path = df_path.melt(ignore_index=False).dropna()
df_path.reset_index(inplace=True)
one_hot = pd.get_dummies(df_path['value'])
df_one = df_path.drop('value',axis = 1)
df_one = df_path.join(one_hot)
grouped_df_one = df_one.groupby(['id']).max()
grouped_df_one = grouped_df_one.drop(columns=['value', 'variable']).reset_index()
df3 = pd.merge(df2, grouped_df_one,on=['id', 'host_id'], how='left')
df3
host_id id value
0 1 10 Hot Water,Cold Water,Kitchen,Coffee
1 1 11 Hot Water,Coffee,Something
2 2 20 Hot Water,Coffee
host_id id some
0 1 10 test1
1 1 11 test2
2 2 20 test3
3 3 30 NaN
What I got
host_id id some Coffee Cold Water Hot Water Kitchen Something
0 1 10 test1 1.0 1.0 1.0 1.0 0.0
1 1 11 test2 1.0 0.0 1.0 0.0 1.0
2 2 20 test3 1.0 0.0 1.0 0.0 0.0
3 3 30 NaN NaN NaN NaN NaN NaN
What I want
host_id id some Coffee Cold Water Hot Water Kitchen Something
0 1 10 test1 1.0 1.0 1.0 1.0 0.0
1 1 11 test2 1.0 0.0 1.0 0.0 1.0
2 2 20 test3 1.0 0.0 1.0 0.0 0.0
3 3 30 NaN 0 0 0 0 0