This question may be related to join or merge with overwrite in pandas
But I refer to sql join, not pandas join: I m looking for equivalent to sql to query tables by foreign keys:
Example: I want to select fields from table of project, whose project_id is a foreign key in the table of clients, whose client_id is a foreign key in the table of workingtimes, and so on.
I may have multiple tables, and want to query by foreign key. In my case:
import psycopg2 as pg
import pandas.io.sql as psql
# get connected to the database
connection = pg.connect("dbname=my_database user=postgres")
df = psql.read_sql("""SELECT * FROM table.bpartner client
JOIN table.project project ON client.bpartner_id = project.bpartner_id
JOIN table.project_usertime workingtime ON workingtime.project_id = project.project_id
""", connection)
I noticed this gives in output duplicated columns: as an example, there are two copies of bpartner_id
.
I remove column duplicates like this:
# remove duplicated columsn > TODO check SQL Join || pandas merge / group_by to handle better
df = df.loc[:, ~df.columns.duplicated()]
Otherwise, I could also create three dataframes, one for each table:
df1 = psql.read_sql("SELECT * FROM table.project", connection)
df2 = psql.read_sql("SELECT * FROM table.bpartner", connection)
df3 = psql.read_sql("SELECT * FROM table.project_usertime", connection)
I wonder which can be the pandas query to now combine the three dataframes without column duplicates, equivalent to the sql join between multiple tables by foreign key - I mean where join queries expresses a "business relations" between tables (if and only if two objects are mutually linked).