2

The process inserting data from the dataframe into the oracle database is work very slow (more than one hour)

Advise me please, how I can to improve the process of inserting data into the database?

count column = 50, raw count = > 250000

I have a lot of such files

Connecting to the database in this way

dialect = 'oracle'
sql_driver = 'cx_oracle'

to_engine: str = dialect + '+' + sql_driver + '://' + user + ':' + password + '@' + host + ':' + str(port) + '/?service_name=' + service_name

connection = create_engine(to_engine)

to_sql method look like this

 df.to_sql(table_name_in_db, con=connection, schema='', if_exists='append', index=False, chunksize = 1000,  dtype=None)

Thanks a lot!

Asli
  • 51
  • 4

1 Answers1

1

This trick help me

Speed up to_sql() when writing Pandas DataFrame to Oracle database using SqlAlchemy and cx_Oracle Conclusion: use the following trick in order to explicitly specify dtype for all DF columns of object dtype when saving DataFrames to Oracle DB. Otherwise it'll be saved as CLOB data type, which requires special treatment and makes it very slow

 dtyp = {c:types.VARCHAR(df[c].str.len().max())
        for c in df.columns[df.dtypes == 'object'].tolist()}

 df.to_sql(..., dtype=dtyp)here
Asli
  • 51
  • 4
  • Further down on that other SO question, `chunksize` was mentioned. You may also want to tune this. – Christopher Jones Mar 03 '22 at 22:12
  • Thanks, "The ideal chunksize depends on your table dimensions. A table with a lot of columns needs a smaller chunk-size than a table that has only 3." I dont know, this is true or not I find this method here https://towardsdatascience.com/dramatically-improve-your-database-inserts-with-a-simple-upgrade-6dfa672f1424 I had a lot of such files where are different numbers of columns and rows – Asli Mar 04 '22 at 05:29
  • You can run some tests to tune chunksize for your data and network. But if you want best performance, then use cx_Oracle directly without the Pandas & SQLAlchemy layers. See the cx_Oracle doc on [batch loading](https://cx-oracle.readthedocs.io/en/latest/user_guide/batch_statement.html#batchstmnt). – Christopher Jones Mar 05 '22 at 03:06