1

I'm trying to read data from an oracle db. I have to read on python the results of a simple select that returns a million of rows.

I use the fetchall() function, changing the arraysize property of the cursor.

select_qry = db_functions.read_sql_file('src/data/scripts/03_perimetro_select.sql')
dsn_tns = cx_Oracle.makedsn(ip, port, sid)
con = cx_Oracle.connect(user, pwd, dsn_tns)


start = time.time()

cur = con.cursor()
cur.arraysize = 1000
cur.execute('select * from bigtable where rownum < 10000')
res = cur.fetchall()
# print res  # uncomment to display the query results
elapsed = (time.time() - start)
print(elapsed, " seconds")
cur.close()
con.close()

If I remove the where condition where rownum < 10000 the python environment freezes and the fetchall() function never ends.

After some trials I found a limit for this precise select, it works till 50k lines, but it fails if I select 60k lines.

What is causing this problem? Do I have to find another way to fetch this amount of data or the problem is the ODBC connection? How can I test it?

Christopher Jones
  • 9,449
  • 3
  • 24
  • 48
paolof89
  • 1,319
  • 5
  • 17
  • 31

2 Answers2

2

Consider running in batches using Oracle's ROWNUM. To combine back into single object append to a growing list. Below assumes total row count for table is 1 mill. Adjust as needed:

table_row_count = 1000000
batch_size = 10000

# PREPARED STATEMENT
sql = """SELECT t.* FROM
            (SELECT *, ROWNUM AS row_num 
             FROM 
                (SELECT * FROM bigtable ORDER BY primary_id) sub_t
            ) AS t
         WHERE t.row_num BETWEEN :LOWER_BOUND AND :UPPER_BOUND;"""

data = []
for lower_bound in range(0, table_row_count, batch_size):
    # BIND PARAMS WITH BOUND LIMITS
    cursor.execute(sql, {'LOWER_BOUND': lower_bound, 
                         'UPPER_BOUND': lower_bound + batch_size - 1})

    for row in cur.fetchall():
       data.append(row)
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • 1
    without an `ORDER BY unique_key_col` and a nested view, the `ROWNUM` can't be consistent and not of any worth. Should be something like [this](https://stackoverflow.com/a/4552803/7998591) for it to work or better like [this](https://stackoverflow.com/a/30322134/7998591) (Oracle12c and above) – Kaushik Nayak Dec 06 '18 at 16:44
  • Interesting...thanks @KaushikNayak. I'm not an Oracle expert. – Parfait Dec 06 '18 at 17:24
  • I want to accept this answer but you need first to edit it with the suggestion of @KaushikNayak. – paolof89 Dec 13 '18 at 12:57
  • Whoops! I thought I did but forgot the nested subquery requirement. – Parfait Dec 13 '18 at 16:47
  • getting error: cx_Oracle.DatabaseError: ORA-00923: FROM keyword not found where expected – Pardeep Beniwal Feb 27 '20 at 07:04
  • @PardeepBeniwal, please ask a new question as likely it differs from this post. – Parfait Feb 27 '20 at 13:58
2

You are probably running out of memory on the computer running cx_Oracle. Don't use fetchall() because this will require cx_Oracle to hold all result in memory. Use something like this to fetch batches of records:

cursor = connection.cursor()
cursor.execute("select employee_id from employees")
res = cursor.fetchmany(numRows=3)
print(res)
res = cursor.fetchmany(numRows=3)
print(res)

Stick the fetchmany() calls in a loop, process each batch of rows in your app before fetching the next set of rows, and exit the loop when there is no more data.

What ever solution you use, tune cursor.arraysize to get best performance.

The already given suggestion to repeat the query and select subsets of rows is also worth considering. If you are using Oracle DB 12 there is a newer (easier) syntax like SELECT * FROM mytab ORDER BY id OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY.

PS cx_Oracle does not use ODBC.

Christopher Jones
  • 9,449
  • 3
  • 24
  • 48
  • I create a loop with your suggestion but it freeze after 12500 extracted rows, no matter the arraysize. Could be some limitation on the connection or on the db user? – paolof89 Dec 10 '18 at 09:24
  • Are you freeing the memory at the end of each loop iteration before the next iteration? – Christopher Jones Dec 11 '18 at 03:00
  • I tried the same script on a machine with less memory space and it works. The problem should be on the network or on the oracle client. First is a windows server with oracle12 (2014), the other is a windows 10 with oracle instantclient_18_3 – paolof89 Dec 11 '18 at 08:10
  • The problem was in some network security rule that didn't drop the connection but was limiting the bandwidth after a certain amount of bits. I discover it with a spool function with SQLPLUS – paolof89 Dec 13 '18 at 12:59
  • @paolof89, how did you track that' Can you please share further details or even the script? Thank you. – neverMind Aug 03 '21 at 16:20