3

I am working on a basic crawler which crawls 5 websites concurrently using threads. For each site it creates a new thread. When I run the program from the shell then the output log indicates that all the 5 threads run as expected. But when I run this program as a supervisord program then the log indicates that only 2 threads are being run everytime! The log indicates that the all the 5 threads have started but only the same two of them are being executed and the rest get stuck. I cannot understand why this inconsistency is happening when it is run from a shell and when it run from supervisor. Is there something I am not taking into account?

Here is the code which creates the threads:

for sid in entries:
    url = entries[sid]
    threading.Thread(target=self.crawl_loop, \
                     args=(sid, url)).start()

UPDATES: As suggested by tdelaney in the comments, I changed the working directory in the supervisord configuration and now all the threads are being run as expected. Though I still don't understand that why setting the working directory to the crawler file directory rectifies the issue. Perhaps some one who knows about how supervisor manages processes can explain?

conquester
  • 1,082
  • 2
  • 21
  • 44
  • Where do the entries come from and where does the log go to? You have a different user name and current working directory when using supervisord and you are potentially using different files than you think. – tdelaney Sep 10 '16 at 02:45

3 Answers3

1

AFAIK python threads can't do threads properly because it is not thread safe. It just gives you a facility to simulate simultaneous run of the code. Your code will still use 1 core only.

https://wiki.python.org/moin/GlobalInterpreterLock

https://en.wikibooks.org/wiki/Python_Programming/Threading

Therefore it is possible that it does not spawn more processes/threads.

You should use multiprocessing I think?

https://docs.python.org/2/library/multiprocessing.html

Evren Yurtesen
  • 2,267
  • 1
  • 22
  • 40
  • I did tried using multiprocessing. The same results. Still I can not understand why all threads run when run from the shell as 'python crawler.py' but when I add it as a job in supervisord only the same 2 threads run always. – conquester Sep 10 '16 at 02:52
  • Perhaps it may help if you can tell us how do you exactly log the number of threads you are running? In other words how do you determine the number of threads running? – Evren Yurtesen Sep 10 '16 at 02:54
  • The threads are predetermined. 5 threads for 5 websites. Also I have updated the question to reflect new developments. – conquester Sep 10 '16 at 03:04
1

I was having the same silent problem, but then realised that I was setting daemon to true, which was causing supervisor problems.

https://docs.python.org/2/library/threading.html#threading.Thread.daemon

So the answer is, daemon = true when running the script yourself, false when running under supervisor.

Chris Barry
  • 4,564
  • 7
  • 54
  • 89
1

Just to say, I was just experiencing a very similar problem.

In my case, I was working on a low powered machine (RaspberryPi), with threads that were dedicated to listening to a serial device (an Arduino nano on /dev/ttyUSB0). Code worked perfectly on the command line - but the serial reading thread stalled under supervisor.

After a bit of hacking around (and trying all of the options here), I tried running python in unbuffered mode and managed to solve the issue! I got the idea from https://stackoverflow.com/a/17961520/741316.

In essence, I simply invoked python with the -u flag.

pelson
  • 21,252
  • 4
  • 92
  • 99