Can I use multithreading to speed up my program limited by IO speed in Python?

Question

I am very new to multithreading to Python and any other language, and I am trying to use multithreading to improve the speed of my program.

Basically, I have many large dataset, and my memory can only fit in 2 of them at the same time, so the solution I have in mind is to read a single dataset first, then load the second dataset while processing the first dataset using multithreading. In this way, I can save the time of waiting for the second dataset to get loaded. Does it work?

You may want to look at [this post](http://stackoverflow.com/questions/990102/python-global-interpreter-lock-gil-workaround-on-multi-core-systems-using-task), and also look into Python's global interpreter lock (GIL) and how it limits threading in applications. From the above link, this snippet seems relevant: _'But that this can seriously backfire on multi-core systems and you end up with IO intensive threads being heavily blocked by CPU intensive threads, the expense of context switching, the ctrl-C problem[*] and so on.'_ — PrestonH, Jan 17 '17 at 19:28
Are the files stored locally? Or are you pulling from a server? — Navidad20, Jan 17 '17 at 19:30
Run benchmarks. If a second concurrent query significantly slows down the first query, you can save time with your approach. — Dávid Horváth, Jan 17 '17 at 19:30
You should use non-blocking I/O and an event loop instead of multithreading. Or maybe two processes that alternate loading and processing data. (If using Python 3.4 or higher is an option, you can use asyncio.) — Sven Marnach, Jan 17 '17 at 19:30
There are some good PyCon videos - search for ```async```, ```await```, and ```concurrency```. It should be fairly easy to set up a simple test to see if your process will benifit. — wwii, Jan 17 '17 at 19:38

Can I use multithreading to speed up my program limited by IO speed in Python?

0 Answers0