1

This question may be related to join or merge with overwrite in pandas

But I refer to sql join, not pandas join: I m looking for equivalent to sql to query tables by foreign keys:

Example: I want to select fields from table of project, whose project_id is a foreign key in the table of clients, whose client_id is a foreign key in the table of workingtimes, and so on.

I may have multiple tables, and want to query by foreign key. In my case:

import psycopg2 as pg
import pandas.io.sql as psql

# get connected to the database
connection = pg.connect("dbname=my_database user=postgres")

df = psql.read_sql("""SELECT * FROM table.bpartner client
JOIN table.project project ON client.bpartner_id = project.bpartner_id
JOIN table.project_usertime workingtime ON workingtime.project_id = project.project_id
""", connection)

I noticed this gives in output duplicated columns: as an example, there are two copies of bpartner_id.

I remove column duplicates like this:

# remove duplicated columsn > TODO check SQL Join || pandas merge / group_by to handle better
df = df.loc[:, ~df.columns.duplicated()]

Otherwise, I could also create three dataframes, one for each table:

df1  = psql.read_sql("SELECT * FROM table.project", connection)
df2  = psql.read_sql("SELECT * FROM table.bpartner", connection)
df3  = psql.read_sql("SELECT * FROM table.project_usertime", connection)

I wonder which can be the pandas query to now combine the three dataframes without column duplicates, equivalent to the sql join between multiple tables by foreign key - I mean where join queries expresses a "business relations" between tables (if and only if two objects are mutually linked).

user305883
  • 1,635
  • 2
  • 24
  • 48

0 Answers0