2

I have a python script that processes a large number of files.

For each file, the script goes line by line, searching for specific RegEx patters. If a pattern is found, the line is copied into the log file.

As sample input, I'm passing it a folder with 42 small files and 3 large files (~1500 lines each).

The scripts processes the first two large files very fast - it needs a few seconds for them. But when it reaches the third large file, it slows down, and it goes slower and slower.

In the middle of the third large file, it needs a whole second per line, and it keeps slowing down. If I don't stop it, the whole run takes an hour!

I added debugging code that prints out the line numbers - that's how I noticed that it keeps churning slower and slower, and it doesn't get stuck somewhere.

I have 20 years experience with c, and many other languages, but I'm a python beginner. What are steps that I can take to troubleshoot this script?

Ada Lovelace
  • 835
  • 2
  • 8
  • 20

1 Answers1

2

If your code is a script you can run cProfile as shown in this answer

python -m cProfile myscript.py

I do not know if this gives you the granularity you wanted, otherwise have a look at The Python Profilers

As for the actual reason your code runs slow I suspect either catastrophic backtracking or that you open and append to your log file every time the pattern matches aka. Shlemiel The Painter

Community
  • 1
  • 1
Jonas
  • 491
  • 6
  • 22
  • Reference to Shlemiel was very helpful. In certain rare circumstances, my script started concatenating lines together to no end... Fixed! – Ada Lovelace May 07 '15 at 14:34
  • Was this the cause of your slowdown? And were the profiling tools helpful / granular enough? – Jonas May 08 '15 at 06:41
  • I never got around to check the profiling. When I read your answer, the first thing I did was check the reference to Shlemiel. When I read the story, I said to myself: "I must be doing Shlemiel's algorithm somewhere". Followed by the "Duh" moment, when I found the bug. – Ada Lovelace May 08 '15 at 21:44