0

I'm having trouble finding an answer to this question, and it may be due to poor phrasing.

I have a small python program that extracts data from a large log file. It then displays the data in a particular format. Nothing fancy, just reads, parses and prints.

It takes about a minute to do this.

Now, I want to run this across 300 files. If I put my code inside a loop that iterates over the 300 files and executes the same piece of code, one by one, it will take 300 minutes to complete. I would rather it didn't take this long.

I have 8 virtual processors on this machine. It can handle extra load when this program is being run. Can I spread the workload over these vcpus to reduce the total runtime? If so - what is the ideal way to implement this?

It's not code I'm after, it's the theory behind it.

Thanks

Philkav
  • 122
  • 5
  • 9
  • 1
    I think your question is answered here: http://stackoverflow.com/questions/203912/does-python-support-multiprocessor-multicore-programming – Johnny Oct 03 '13 at 12:47

1 Answers1

1

Don't make parallelism your first priority. Your first priority should be making the single-thread performance as fast as possible. I rely on this method. From your short description, it sounds like there might be juicy opportunities for speedup in the I/O and in the parsing.

After you do that, if the program is CPU-bound (which I doubt - it should be spending most of its time in I/O) then parallelism might help.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks for your answer. I used your method (numberous random stack examples) and 90% of the time, they were at the same point. I need work on making this code more efficient, before I even get started on parallelism. – Philkav Oct 04 '13 at 09:36
  • 1
    @Philkav: Yes. That's the kind of thing I often see. If you need help, particularly with parsing, feel free to ask. – Mike Dunlavey Oct 04 '13 at 12:52