0

I want to create a new_column with merging df and df2 (df=pd.merge(df,df2, how='left', on='Type') ), but only if df['Type']=='certain_value'

So if with a normal merge the new_column would be completely filled, I would like a bunch of NaNs and only keep values where a certain condition is met from another column.

Is there a way of doing this?

I've tried replacing values with NaN after merging but am struggling with that too:

np.where(df['Type']!='certain_value', np.NaN, df['new_column'])

which doesn't do what I would like it to.

lala345
  • 129
  • 6

1 Answers1

0

Assuming you mean you want to create a new column filled with a particular column from df2, this should work if there is no column name repetition apart from the join key. (If there is then they will be suffixed to distinguish the sources) If it's a numerical column the other rows will be implicitly set to Nan.

rows_of_interest = df['Type']=='certain_value'

df.loc[rows_of_interest, 'new_column'] = pd.merge(df[rows_of_interest],df2, how='left', on='Type')['new_column']
fibonachoceres
  • 727
  • 4
  • 14
  • Thank you, that's very close, but for some reason it only joins/keeps 150 values instead of 800. My code is midstream_asset = df['Industry Sector']=='Midstream' df.loc[midstream_asset, 'Asset'] = pd.merge(df[midstream_asset],asset_lookup, on='Project Type', how='left')['Asset'], and the column names in asset_lookup are the same as in the original df (Project Type & Asset). There are no spelling issues/spaces/capital letters etc – lala345 Dec 21 '21 at 12:22
  • You could check the values of midstream_asset.sum() and df2[df2['Industry Sector'] == 'Midstream']] just to be sure the data you're expecting is correctly being indexed – fibonachoceres Dec 21 '21 at 12:25