0

I have 4 dataframes to which I apply inner join and left joins .

I am trying to rewrite the query in python.How can we do all the joins together at a time in pandas code.

SQL

 select hd.order_no,hd.order_id,hd.ship_country,User.LT_ID,User.DEST_REGION,Auto.m0_act_0400,Auto.m0_act_0500
    FROM
        header hd
        INNER JOIN key ky ON (hd.order_no = ky.order_no)
        LEFT JOIN User lt on (ky.lt_id = lt.lt_id)
        LEFT JOIN Auto a on (hd.order_id = a.order_id )

*

How can this be done in pandas easily.

Rahul rajan
  • 1,186
  • 4
  • 18
  • 32
pankaj
  • 420
  • 1
  • 8
  • 26
  • `INNER JOIN` is equivalent to `df.merge(df2, how='inner')`, `LEFT JOIN` is `how='left'`. You can chain the `merge`, i.e. `df1.merge(df2, on='...').merge(df3,)...` – Quang Hoang Jan 14 '20 at 02:28
  • @QuangHoang, I am also having similar issue, Could please post as a answer as this could be helpful – Rahul rajan Jan 14 '20 at 02:31
  • @QuangHoang, what if i want select all the columns across different dataframes,should indivually mention them or is there an easy way – Rahul rajan Jan 14 '20 at 02:42
  • in that case (e.g, `SELECT *`), you don't need to specify any columns. – Quang Hoang Jan 14 '20 at 02:43
  • @QuangHoang, how can store this merged result into a new dataframe – Rahul rajan Jan 14 '20 at 02:57
  • it is a new dataframe, you can assign the whole thing to a variable, e.g. `df = (header.merge...)` – Quang Hoang Jan 14 '20 at 02:59
  • @QuangHoang, I am getting columns duplicates like `COUNTRY_x` and `COUNTRY_y` for a coulmn `COUNTRY` when doing the merge.How can this removed – Rahul rajan Jan 14 '20 at 03:30
  • 1. you can figure where `COUNTRY` comes from, and pass it to `on`, e.g. `on=['LT_ID', 'COUNTRY']` or 2. pass `suffixes = ['', '_drop']` and then drop the extra columns later, e.g. by chaining `.drop('COUNTRY_drop', axis=1)` to the above. – Quang Hoang Jan 14 '20 at 03:32

1 Answers1

2

You can try this:

(header.merge(Key, 
              left_on='order_no', 
              right_on='ORDER_NO',      # pandas/python cares about cases     
              how='inner')              # how='inner' is default so can be skipped
     .merge(User, on='LT_ID', how='left')           # chaining merge
     .merge(Auto, on='order_id', how='left')
     [['order_no', 'order_id', 'ship_country',      # select the columns
       'LT_ID', 'DEST_REGION', 'm0_act_0400', 'm0_act_0500']]
)

For details about merge see this question.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74