1

Having a worksheet with ~20.000 rows of data, what is the best approach to insert those into a postgres database?

The table in which I will insert the data consists of many foreign keys, which means that I cannot simply insert this table with an approach as described here: Bulk Insert A Pandas DataFrame Using SQLAlchemy. Whilst I have values such as "Shoes", "Jacket", "Bag" etc. I need those to be in the form of the ID of a foreign table.

I can transform this data easily to a DataFrame with xlwings, but then I still have to figure out how to easily and quickly take care of the foreign key transformations.

E.g. Table 1:

product_id  country product
1           USA     Shoes
2           UK      Jacket
3           GER     Bag

E.g. Table 2:

user_id Name
1       John
2       Larry
3       Page

E.g. Table 3 (The one I'm uploading to):

order_id    user    product
1           3       2
2           2       2
3           1       1

And finally the table I have in excel:

user    bought
John    Shoes
Larry   Shoes
Page    Bag

Thanks!

Bishonen_PL
  • 1,400
  • 4
  • 18
  • 31

1 Answers1

0

First load foreign keys from database and then map with data frame columns. As an example:

engine = create_engine('conn string')
col_tables={'col1':'table1', 'col2':'table2' }
for col, table in col_tables.items()
    res = engine.execute("select name, id from {}".format{table})
    d = dict(res)
    df[col] = df[col].map(d)
  • Thanks @K.Palyanichka. Indeed quite simple - is this the most efficient way to do it though? This involves querying as many tables as many foreign keys I have and downloading all the data from those. Probably my inexperience but I thought that there might be perhaps a more efficient way to go about this. – Bishonen_PL May 17 '18 at 15:13
  • if you df columns related to databse columns you can load all fk and then map in loop with df columns without col_tables dict. How load all fk`s you can find at https://stackoverflow.com/a/44908972 – Kostiantyn Palianychka May 17 '18 at 15:25