2

I would like to join 2 dataframes that look like below using python's pandas:

customer_orders = pd.DataFrame({'customerID': [1, 2, 2, 1],
                    'customerName': ['John', 'Anna', 'Anna', 'John'],
                    'customerAge': [21, 45, 45, 21],
                    'orderID': [255, 256, 257, 258],
                    'paymentType': ['visa', 'bank', 'master', 'paypal']})

that creates:

 customerAge  customerID customerName  orderID paymentType
0           21           1         John      255        visa
1           45           2         Anna      256        bank
2           45           2         Anna      257      master
3           21           1         John      258      paypal

and

order_products = pd.DataFrame({'orderID': [255, 255, 257, 258, 255, 257],
                           'price': [9.99, 23.40, 15.89, 3.99, 89.50, 23.40],
                           'productName': ['filter', 'cosmetic', 'shampoo', 'tissues', 'elecBrush', 'cosmetic']})

that creates:

   orderID  price productName
0      255   9.99      filter
1      255  23.40    cosmetic
2      257  15.89     shampoo
3      258   3.99     tissues
4      255  89.50   elecBrush
5      257  23.40    cosmetic

To something like below
Expected Output

 customerAge  customerID customerName  orderID paymentType
           21           1         John      255        visa     255   9.99      filter
           21           1         John      255        visa     255  23.40    cosmetic
           21           1         John      255        visa     255  89.50   elecBrush
           45           2         Anna      256        bank     null  null         null
           45           2         Anna      257      master     257  15.89     shampoo
           45           2         Anna      257      master     257  23.40    cosmetic  
           21           1         John      258      paypal     258   3.99     tissues

To my knowledge this is an SQL left join. But using

all = customer_orders.join(order_products, on="orderID", how='left', lsuffix='_left', rsuffix='_right')

does not give me what I want (too few lines and NaN instead of the values of the second table).

What am I missing?

cs95
  • 379,657
  • 97
  • 704
  • 746
lordy
  • 610
  • 15
  • 30

2 Answers2

4

Left? No, this is an outer join.

customer_orders.merge(order_products, on="orderID", how='outer')

   customerAge  customerID customerName  orderID paymentType  price  \
0           21           1         John      255        visa   9.99   
1           21           1         John      255        visa  23.40   
2           21           1         John      255        visa  89.50   
3           45           2         Anna      256        bank    NaN   
4           45           2         Anna      257      master  15.89   
5           45           2         Anna      257      master  23.40   
6           21           1         John      258      paypal   3.99   

  productName  
0      filter  
1    cosmetic  
2   elecBrush  
3         NaN  
4     shampoo  
5    cosmetic  
6     tissues  
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 3
    This works to my knowledge. May I request some feedback on the answer besides the downvote so I may improve? – cs95 Apr 20 '18 at 14:29
  • They still not left any reason ...sign – BENY Apr 20 '18 at 14:33
  • I did not downvote you! Your answer is correct and what I needed. My mistake was to use join instead of merge (index instead of column) see also: https://stackoverflow.com/questions/22676081/what-is-the-difference-between-join-and-merge-in-pandas – lordy Apr 20 '18 at 14:39
  • @user2952361 Thanks. It's probably someone who doesn't appreciate my content very much. Glad it helped. – cs95 Apr 20 '18 at 14:42
  • 1
    Really interesting that `pandas` has same functions like `SQL` . I am just glamorous about how complex is `pandas` and works perfecly with python for machine learning applications. +1 – Mihai Alexandru-Ionut Apr 20 '18 at 14:49
0

Try using merge

all = customer_orders.merge(order_products, on="orderID", how='left')
Tyler K
  • 328
  • 2
  • 7