Improving performance of a repetitive, time-consuming program

Question

I'm having trouble finding an answer to this question, and it may be due to poor phrasing.

I have a small python program that extracts data from a large log file. It then displays the data in a particular format. Nothing fancy, just reads, parses and prints.

It takes about a minute to do this.

Now, I want to run this across 300 files. If I put my code inside a loop that iterates over the 300 files and executes the same piece of code, one by one, it will take 300 minutes to complete. I would rather it didn't take this long.

I have 8 virtual processors on this machine. It can handle extra load when this program is being run. Can I spread the workload over these vcpus to reduce the total runtime? If so - what is the ideal way to implement this?

It's not code I'm after, it's the theory behind it.

Thanks

I think your question is answered here: http://stackoverflow.com/questions/203912/does-python-support-multiprocessor-multicore-programming — Johnny, Oct 03 '13 at 12:47

score 1 · Accepted Answer · edited May 23 '17 at 10:25

1

Don't make parallelism your first priority. Your first priority should be making the single-thread performance as fast as possible. I rely on this method. From your short description, it sounds like there might be juicy opportunities for speedup in the I/O and in the parsing.

After you do that, if the program is CPU-bound (which I doubt - it should be spending most of its time in I/O) then parallelism might help.

edited May 23 '17 at 10:25

Community

1
1

answered Oct 03 '13 at 12:54

Mike Dunlavey

40,059
14
91
135

Thanks for your answer. I used your method (numberous random stack examples) and 90% of the time, they were at the same point. I need work on making this code more efficient, before I even get started on parallelism. – Philkav Oct 04 '13 at 09:36
1

@Philkav: Yes. That's the kind of thing I often see. If you need help, particularly with parsing, feel free to ask. – Mike Dunlavey Oct 04 '13 at 12:52

Improving performance of a repetitive, time-consuming program

1 Answers1