Python cx_Oracle query speed

Question

Problem

It takes a long time to query our Oracle database with python using cx_Oracle.

Method

I connect to oracle db using cx_Oracle, and query using a standard SQL line like:

select * from t1;

Results are returned as a pandas dataframe.

Observations

Querying from Spyder and Jupyter Notebooks are equally slow.
Querying from an SQL client like DBeaver returns results in ~1/10th the time.

Caveat

I have not tested if this holds true when I include row-limits in the cx-Oracle queries.

Your question is a bit generic. Without additional information (like what data you are querying and what code you are using to query it) I can't answer your question. I find the performance of cx_Oracle adequate in most cases, somewhat slower than the equivalent C program but with a lot less effort. Since you are using a dataframe that implies you are using something like Pandas. Try using cx_Oracle directly as well. — Anthony Tuininga, Mar 30 '20 at 15:51

Christopher Jones · Answer 1 · 2021-07-21T22:40:43.137

2

Tune the application architecture
Tune the Database
Tune the SQL
Tune the network. Particularly the SDU size and socket buffer sizes. (If it helps, there is an alternative description of these in the Oracle Net Easy Connect Plus Whitepaper).
Tune cx_Oracle, particularly Cursor.arraysize

edited Jul 21 '21 at 22:40

answered Mar 28 '20 at 23:02

Christopher Jones

9,449
3
24
48

score 0 · Answer 2 · answered Apr 05 '20 at 15:39

@Christopher Jones and @Anthony Tuininga pointed to next level answers in solving why queries were returning so slowly. In my #Caveat section, I noted that I had no

WHERE rownum < 10

in my queries. Including this did indeed reduce query times. My bad.

Also, a command like

create table t1 as
select *
from table1

results in instant feedback, since nothing is being received from the db. It was just a matter of issuing a query that demanded a lot of data back. An SQL client has those row limits built in to prevent people like me from asking these questions! :)

MinneapolisCoder9 · Answer 3 · 2021-07-28T13:05:27.103

Addressing Slow Write Speeds

When writing to Oracle, all df columns which are String objects are by default converted to CLOB (Character large objects), which are max 2,147,483,647 in length). This means that tables are very likely far larger than necassary.

When using df.to_sql, you typically use something like the following:

df.to_sql("tablename"
        connection_string,
        schema='myschema',
        if_exists='replace',
        index=False
        )

However, omitting the optional parameter dtypes may result in CLOBs.

A Solution

To overcome this, you either manually provide dtypes:

df.to_sql("tablename"
        connection_string,
        schema='myschema',
        if_exists='replace',
        index=False,
        dtype={"RUN_DT": sa.types.DateTime(),
        "fname": sa.types.VARCHAR(length=50),
        "lname": sa.types.VARCHAR(length=250),
        "FROM_DT": sa.types.DateTime(),
        "TO_DT": sa.types.DateTime(),
        "UUID": sa.types.VARCHAR(length=50)}
        )

A Better, Generalized Solution

But what if you have many tables and many columns and you need a generalized solution?

@Parfait provided the perfect solution to this issue on SO here.