1

I have very simple mysql query as following:

db = getDB()
cursor = db.cursor()
cursor.execute('select * from users')
results = cursor.fetchall()
for row in results:
    process(row)

Suppose users table has 1 billion records, the process method for one record takes 10ms. The above code will finish fetching all of the data to client side and then starting process method. It really waste time. Should I do query and process parallel please?

So I'd like to change fetchall() to fetchmany() and start a new thread for process the retrieved result when cursor starting to query new result.

Jack
  • 5,540
  • 13
  • 65
  • 113
  • make changes in select query. – Kenly Nov 20 '15 at 18:23
  • 1
    What kind of processing are you doing? SQL is a powerful language, you can do a lot of things with SQL. When SQL is not enough, MySQL allows you to write UDFs in C which can be made extremely fast. Today there is hardly ever a need to fetch a billion records into the client and do post-processing - you try to avoid that at all costs. – Sasha Pachev Nov 20 '15 at 18:35
  • http://stackoverflow.com/questions/1808150/how-to-efficiently-use-mysqldb-sscursor – David Ehrmann Nov 20 '15 at 18:51
  • Usually, you'd have a producer-consumer design for processing like that, and that scales very nicely with multiple worker threads, but that's not easy with Python because of the GIL. Threads buy you a lot more in Python when you're IO-bound, not CPU-bound. – David Ehrmann Nov 20 '15 at 18:52

0 Answers0