I would like to join 2 dataframes that look like below using python's pandas:
customer_orders = pd.DataFrame({'customerID': [1, 2, 2, 1],
'customerName': ['John', 'Anna', 'Anna', 'John'],
'customerAge': [21, 45, 45, 21],
'orderID': [255, 256, 257, 258],
'paymentType': ['visa', 'bank', 'master', 'paypal']})
that creates:
customerAge customerID customerName orderID paymentType
0 21 1 John 255 visa
1 45 2 Anna 256 bank
2 45 2 Anna 257 master
3 21 1 John 258 paypal
and
order_products = pd.DataFrame({'orderID': [255, 255, 257, 258, 255, 257],
'price': [9.99, 23.40, 15.89, 3.99, 89.50, 23.40],
'productName': ['filter', 'cosmetic', 'shampoo', 'tissues', 'elecBrush', 'cosmetic']})
that creates:
orderID price productName
0 255 9.99 filter
1 255 23.40 cosmetic
2 257 15.89 shampoo
3 258 3.99 tissues
4 255 89.50 elecBrush
5 257 23.40 cosmetic
To something like below
Expected Output
customerAge customerID customerName orderID paymentType
21 1 John 255 visa 255 9.99 filter
21 1 John 255 visa 255 23.40 cosmetic
21 1 John 255 visa 255 89.50 elecBrush
45 2 Anna 256 bank null null null
45 2 Anna 257 master 257 15.89 shampoo
45 2 Anna 257 master 257 23.40 cosmetic
21 1 John 258 paypal 258 3.99 tissues
To my knowledge this is an SQL left join. But using
all = customer_orders.join(order_products, on="orderID", how='left', lsuffix='_left', rsuffix='_right')
does not give me what I want (too few lines and NaN instead of the values of the second table).
What am I missing?