You can get some speed-up, depending on the number and size of your files. See this answer to a similar question: Efficient file reading in python with need to split on '\n'
Essentially, you can read multiple files in parallel with multithreading, multiprocessing, or otherwise (e.g. an iterator)… and you may get some speedup. The easiest thing to do is to use a library like pathos
(yes, I'm the author), which provides multiprocessing, multithreading, and other options in a single common API -- basically, so you can code it once, and then switch between different backends until you have what works the fastest for your case.
There are a lot of options for different types of maps (on the pool
object), as you can see here: Python multiprocessing - tracking the process of pool.map operation.
While the following isn't the most imaginative of examples, it shows a doubly-nested map (equivalent to a doubly-nested for loop), and how easy it is to change the backends and other options on it.
>>> import pathos
>>> p = pathos.pools.ProcessPool()
>>> t = pathos.pools.ThreadPool()
>>> s = pathos.pools.SerialPool()
>>>
>>> f = lambda x,y: x+y
>>> # two blocking maps, threads and processes
>>> t.map(p.map, [f]*5, [range(i,i+5) for i in range(5)], [range(i,i+5) for i in range(5)])
[[0, 2, 4, 6, 8], [2, 4, 6, 8, 10], [4, 6, 8, 10, 12], [6, 8, 10, 12, 14], [8, 10, 12, 14, 16]]
>>> # two blocking maps, threads and serial (i.e. python's map)
>>> t.map(s.map, [f]*5, [range(i,i+5) for i in range(5)], [range(i,i+5) for i in range(5)])
[[0, 2, 4, 6, 8], [2, 4, 6, 8, 10], [4, 6, 8, 10, 12], [6, 8, 10, 12, 14], [8, 10, 12, 14, 16]]
>>> # an unordered iterative and a blocking map, threads and serial
>>> t.uimap(s.map, [f]*5, [range(i,i+5) for i in range(5)], [range(i,i+5) for i in range(5)])
<multiprocess.pool.IMapUnorderedIterator object at 0x103dcaf50>
>>> list(_)
[[0, 2, 4, 6, 8], [2, 4, 6, 8, 10], [4, 6, 8, 10, 12], [6, 8, 10, 12, 14], [8, 10, 12, 14, 16]]
>>>
I have found that generally, unordered iterative maps (uimap
) are the fastest, but then you have to not care which order something is processed as it might get out of order on the return. As far as speed… surround the above with a call to time.time
or similar.
Get pathos
here: https://github.com/uqfoundation