Mapping columns from one dataframe to another to create a new column with missing values

Question

My main df has a column named product_code I want to map it with another df1 with a product_code column in order to add product_type from df1 into df. My problem is that df.product_code has more values than df1.product_code.

df :

` product_code customer_code market_code order_date sales_qty
0 Prod001 Cus001 Mark001 2017-10-10 100
1 Prod013 Cus001 Mark001 2017-10-10 240
2 Prod016 Cus001 Mark001 2018-06-28 187
3 Prod020 Cus001 Mark001 2017-10-10 140
4 Prod020 Cus001 Mark001 2017-10-25 47
... ... ... ... ... ...
149996 Prod301 Cus034 Mark011 2017-12-22 1
149997 Prod301 Cus034 Mark011 2018-11-05 1
149998 Prod301 Cus034 Mark011 2019-09-05 1
149999 Prod302 Cus034 Mark011 2017-12-22 1
150000 Prod302 Cus034 Mark011 2019-09-05 1

df1 :

 product_code    product_type
0        Prod001     Own Brand\n
1        Prod002     Own Brand\n
2        Prod003     Own Brand\n
3        Prod004     Own Brand\n
4        Prod005     Own Brand\n
..           ...             ...
274      Prod275     Own Brand\n
275      Prod276     Own Brand\n
276      Prod277     Own Brand\n
277      Prod278  Distribution\n
278      Prod279  Distribution\n

So far I have been lucky and only had to merge columns with matching values using this formula :

df2 = (df.merge(df1, left_on='product_code', right_on='product_code') .reindex(columns=['product_code', 'customer_code', 'market_code', 'order_date','sales_qty', 'sales_amount', 'currency','markets_name','zone','customer_name','product_type']))

I know I will have NaN values in df2.product_type but I need to map it anyway. Any advice?

Are you sure there's no leading/trailing spaces for one of the columns? Try `print(df.columns)` and `print(df2.columns)` to double check that. — Jon Clements, Apr 02 '22 at 17:31

score 0 · Answer 1 · answered Apr 02 '22 at 17:25

I would suggest using a left merge. This will take all of the values from df and import the matching values from df1. Also to simplify your code a bit you can use the on parameter (Note: this only works if both dataframes have a column with the same name) instead of the left_on and right_on.

df2 = df.merge(df1, how='left', on='product_code')

Here is the full documentation for the pandas.DataFrame.merge() function it will give you all of the available parameters with examples: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

Mapping columns from one dataframe to another to create a new column with missing values

1 Answers1