0

I have the following code and I'm trying to read a very big table that has over 100M rows on MariaDB. In theory execute is just going to set the cursor and then whenever I iterate over a row it's going to fetch it or at least this is what it says on the docs.

import pyodbc

cnxn  = pyodbc.connect('DRIVER=/usr/lib/libmaodbc.so;socket=/var/run/mysqld/mysqld.sock;Database=101m;User=root;Password=123;Option=3;')
cursor = cnxn.cursor()
cursor.execute("select * from vat")
for row in cursor:
  print(row)

I tried following versions of the code but with no results.

import pyodbc

cnxn  = pyodbc.connect('DRIVER=/usr/lib/libmaodbc.so;socket=/var/run/mysqld/mysqld.sock;Database=101m;User=root;Password=123;Option=3;')
with cnxn.cursor() as cursor:
  cursor.execute("select * from vat")
  for row in cursor:
    print(row)
import pyodbc

cnxn  = pyodbc.connect('DRIVER=/usr/lib/libmaodbc.so;Server=127.0.0.1;Database=101m;User=root;Password=123;Option=3;') # tcp instead of unix socket
with cnxn.cursor() as cursor:
  cursor.execute("select * from 101m") # another big table
  for row in cursor:
    print(row)

Update: Even without the for loop the execute itself takes a long time. And what I'm trying to do is copying data from MariaDb server to a sqlite database.

  • 1) `for row in cursor: print(row)` is probably the slowest way to do this, especially the `print` part. Look here [pyodbc cursor](https://github.com/mkleehammer/pyodbc/wiki/Cursor) for the `fetch*` methods on the cursor. 2) It would be helpful to know what you plan to do with the results? Add as update to question. – Adrian Klaver Jan 30 '23 at 19:51
  • Updated the question just as you suggested. And how is for loop slow in this situation, what'd you recommend? – enes dogan Jan 30 '23 at 20:07
  • Why are you using the MariaDB ODBC connector for PostgreSQL ? Also what do you mean slow ? And what's in the table ? 100M records is going be a fair bit of data to move around. – ljmc Jan 30 '23 at 20:23
  • I meant to say MariaDb, fixed the typo. – enes dogan Jan 30 '23 at 20:38
  • @enesdogan For reference, `list(range(100_000_000))` takes around 15s on my computer to execute, and that is in memory and only a single integer wide, it's going to be a lot longer pulling and pushing records from a database. – ljmc Jan 30 '23 at 20:58
  • @ljmc what'd you recommend that I use for this situation? And as I've said just the execute itself takes a long time even without the loop. – enes dogan Jan 31 '23 at 07:42
  • What is the average row size? Did you check the memory status of your machine when running your python script? – Georg Richter Jan 31 '23 at 07:49
  • a single row is about 1kB. But what I'm trying to do here is load rows to memory one by one so that I wouldn't use so much memory. – enes dogan Jan 31 '23 at 08:36
  • You're talking about moving 100GB, that is a lot of data, I would start by researching database migration, and maybe look at this [question](https://stackoverflow.com/q/5164033/17676984). – ljmc Jan 31 '23 at 09:28

1 Answers1

0

According to your ODBC DSN server and client are running on the same machine.

In your comment you mentioned that you want to move 100GB with a single select. This will require a lot of memory

  • 100 GB for client network buffer
  • 100 GB for client row buffer

Additionally, a lot of GB on the server side will be used to prepare and send the result (Note that the server is running on the same machine). This will lead to memory problems (swapping) and slow down your machine.

I would recommend to fetch the data in portions and to use mariadb,pymysql or mysqldb module instead.

Example (without any exception handling):

import mariadb
conn= mariadb.connect(db="101m", user="root", password="123", host="localhost")

cursor= conn.cursor()
cursor.execute("SELECT COUNT(*) FROM vat")
total= cursor.fetchone()[0]
   
# This value depends on your system configuration 
records = 2048

for i in range(0, total, step):
    cursor.execute("select * from vat LIMIT ?, records", (i, step))
    rows= cursor.fetchall()
    # process result

Another solution would be to use server side cursors, however this solution is only supported by mariadb module:

cursor= conn.cursor(cursor_type=CURSOR.READ_ONLY, prefetch_size=4096)
cursor.execute(SELECT * from vat")
for row in cursor:
  #process row
Georg Richter
  • 5,970
  • 2
  • 9
  • 15