0

I am trying to make faster to get large size data from SQL.

Here is my system info.

OS: Linux
Language: python3
SQL library: pymssql

sample code

msg= "SELECT TAG_CODE, UI_X, UI_Y, INSERT_TIME FROM dbo.R_TAG_HIST WHERE "
        msg+= "CREATE_TIME > '" + self.start_time + \
            "' AND CREATE_TIME < '" + self.end_time + "' "
        msg += "AND TAG_CODE IN " + \
            str(self.tag_ids).replace("[", "(").replace("]", ")")
        msg+= " ORDER BY TAG_CODE"

def receive_all_tables(self, msg):
    try:
        # check connection
        try:
            hasattr(self.conn, '_conn')
        except:
            self.connect_db()

        with self.conn.cursor() as cursor:
            cursor.execute(msg)             
            tables = cursor.fetchall()      
            self.conn.commit()                  
        return tables

    except Exception as e:
        exc_type, exc_obj, exc_tb = sys.exc_info()
        print("fail to receive query.", exc_type, exc_obj, exc_tb.tb_lineno, e)

def result_iterator(self, cursor, arraysize=1000):
        # 'iterator using fetchmany and consumes less memory'
        while True:
            results = cursor.fetchmany(arraysize)
            if not results:
                break
            for result in results:
                yield result

def receive_all_tables_by_iterator(self, msg):
    try:
        # check connection
        try:
            hasattr(self.conn, '_conn')
        except:
            self.connect_db()

        tables=[]
        with self.conn.cursor() as cursor:
            cursor.execute(msg)             
            for result in self.result_iterator(cursor) :
                tables.append(result)       
            # self.conn.commit()                   
        return tables

sample data have 30k lines.

elapsed time using fetchall() : 6.272587060928345 sec\
elapsed time using fetchall() : 6.012945890426636 sec

I want to reduce elapsed time to get data.
I would like to know another good way to receive large data from db.

Help me please :)

djowlrid
  • 71
  • 1
  • 5
  • Can you share your sql query? It's not in the code sample you posted. Are you doing "select * from table"? – Emrah Diril Jan 25 '21 at 06:05
  • i add sample query. – djowlrid Jan 25 '21 at 06:41
  • Unrelated: you should use parameterised queries. Not for performance reasons but for security reasons. – mhawke Jan 25 '21 at 06:42
  • Where is the time elapsed measurement code? How do you know whether the query is slow to execute, the data transfer is lengthy, or both? Is the database server running on the local machine, or is there a network involved? – mhawke Jan 25 '21 at 06:47
  • Also. `hasattr(self.conn, '_conn')` doesn't do what you think - it returns a `bool`, no exception is raised. – mhawke Jan 25 '21 at 06:52
  • please understand that I cannot upload full code due to my company regulations. I checked elapsed time by checking duration time before and after the execute function. so I found that it took a lot of time to fetch – djowlrid Jan 25 '21 at 06:58
  • @djowlrid: fair enough. If you run the query with your database's CLI you could get an idea whether the query is slow on the server. – mhawke Jan 25 '21 at 07:02
  • oh thank you!. I will ask it to db manager and check it together! – djowlrid Jan 25 '21 at 07:53

2 Answers2

1

you can try multiprocessing or spark for faster query on large data

  1. Fastest way to read huge MySQL table in python
  2. https://spark.apache.org/docs/1.5.2/sql-programming-guide.html
autitya
  • 71
  • 5
1

One thing you can do to speed up your query is add an index on the CREATE_TIME and TAG_CODE columns if they don't have any.

Emrah Diril
  • 1,687
  • 1
  • 19
  • 27
  • You mean that when I create a database, I should add index information, right? Let me check. Thank you!!! ^^ – djowlrid Jan 25 '21 at 08:01