1

I am experienced with SQL, but am new to Python.

I am attempting to use the join or pandas.merge functions to complete the following simple SQL Join:

SELECT
a.Patient_ID,
a.Physician,
b.Hospital
FROM DF1 a
INNER JOIN DF2 b
on a.Patient_ID=b.Patient_ID_Number

Here is as close as I've gotten:

import pandas as pd
output=pd.merge(DF1, DF2, how='inner', left_on='Patient_ID', right_on='Patient_ID_Number')

However, this produces the equivalent of the following SQL query:

SELECT
    *
    FROM DF1 a
    INNER JOIN DF2 b
    on a.Patient_ID=b.Patient_ID_Number

I'm not familiar with indexing or keys, so I am trying to implement a simple code translation only for now. If it's not integral but just a nice feature, I will learn it later.

Thanks!

Samsonite Manly
  • 569
  • 2
  • 7
  • 15

1 Answers1

2

You can specify the columns in the join statement:

 output=pd.merge(DF1[['Patient_ID','Physician']], DF2[['Hospital','Patient_ID_Number']], how='inner', left_on='Patient_ID', right_on='Patient_ID_Number')

You do have to carry over both columns that you're joining on in your statement

You can specify which columns to keep before your join statement

DF1=DF1[['Patient_ID','Physician']]
DF2=DF2[['Patient_ID_Number','Hospital']]
output=pd.merge(DF1, DF2, how='inner', left_on='Patient_ID', right_on='Patient_ID_Number')

Or only keep the columns after your join statement

output=output[['Patient_ID','Physician','Hospital']]