1

I have two data frames

The first frame is my IDs, some 'old code' matches to one 'Master ID'. Some OLD code are not matched to a Master ID.

ID Dataframe

MASTER ID  OLD CODE 

  MASTER1    1A
  MASTER1    1B
  MASTER2    2
  MASTER3    3
             4

Sales

OLD CODE  Salesvalues  
1A         10           
1B         15           
2           6           
3           8   
4           5

If I am doing a right join or an outer join, it returns more rows then my original sales table. How I can make a join on the first matching 'MASTER ID' match and keeping the same number of rows(no multiple duplicate rows). I would like if there is no match for the'old code' on 'master ID', that will returns NA.

Expected Merge dataframe

  OLD CODE  Salesvalues  MASTER ID (Join column) 
    1A         10           MASTER1
    1B         15           MASTER1
    2           6           MASTER2
    3           8           MASTER3
    4           5             NA
Simon GIS
  • 1,045
  • 2
  • 18
  • 37
  • Please see the update. If I do this it returns more rows then my sales dataframe and I want to add the NA match – Simon GIS Sep 05 '19 at 08:14

1 Answers1

1

See if this works for you.

Sales.merge(ID Dataframe,on='OLD_CODE',how ='outer')
moys
  • 7,747
  • 2
  • 11
  • 42
  • I have a data frame of sales of 4209900 rows and a dataframe of ID which do not have all Master ID matching. If I applied this code, it returns me 14145 rows. I cannot use a inner join because i want my 4209900 rows with some matching Master ID or NA – Simon GIS Sep 05 '19 at 08:10
  • calls = pd.merge(sales, ID Dataframe, on= "old code", how= "left"), return 4223111 rows instead of 4209900 – Simon GIS Sep 05 '19 at 09:03
  • can you use the merge exactly as i have put & check? With the data you have provided, i get the result you are expecting. – moys Sep 05 '19 at 09:21