0

I write a function collect_data to use 4 threads to collect data from 4 websites(say, website a to d) and update the data in a table of mysql, simutanously; In the table, there are 4 fields with each stores data from one website No lock is used when the 4 threads updating the table. The 4 threads collect data and update data in mysql every 10 seconds.

def collect_data(site_list = ['a', 'b', 'c', 'd']):
     for site in site_list:
        InfoCollectingThread(site).start()

Unfortunatelly, after a periods of running (say 3 hours), there are no new records in mysql. It also seems that some threads dies earlied because the corresponding fields aren't updated earlier.

My question is what problem lies in the processs of my design and any solution? Merry Xmas.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
Deep_fox
  • 405
  • 2
  • 6
  • 14
  • 1
    To be honest, there are so many possibilities that we'd be taking stabs in the dark. – NPE Dec 25 '14 at 08:03
  • You are a poet, but I'd like to seek possible reasons. – Deep_fox Dec 25 '14 at 08:12
  • Possible reasons: many. First, test it with ONE thread. If that doesn't work then you have eliminated multi-thread issues, and it is MUCH easier to debug. If it does work the you need to look at your threading. For example, is the mysql module thread-safe? If you don't know then better assume it is not and use some locks. – cdarke Dec 25 '14 at 08:17
  • 1
    By the way, I notice you are using a list as a default - this might not have the effect you want. Python does not create a new list each time you call the function, it uses the same list object on each default call. – cdarke Dec 25 '14 at 08:19
  • @cdarke Very good suggestions, I'll try that and merry Xmas. – Deep_fox Dec 25 '14 at 08:27
  • @cdarke Now I can locate the problem. I think it's about my threading because I don't write data to sql database now and the threads still get blocked. Any suggestion to further analyze the problem?thanks – Deep_fox Dec 25 '14 at 13:16
  • 1
    At least you have a hard block, its worse when the symptoms are intermittent. Your approach to eliminate items like the database is a good one, continue with that. You need to find which line of code the threads are executing when they are blocked. Which blocking calls do you have? If you don't think you have any, consider things like communications and IPC. – cdarke Dec 26 '14 at 08:21
  • @cdarke Thanks for your helpful suggestions. Now I abandon the threading and turn to APScheduler. It's quite ok now. Thanks – Deep_fox Dec 26 '14 at 12:25

1 Answers1

2

There are too many possible reasons and too little information for us to make informed guesses. What I can offer you are some suggestions for how to troubleshoot this:

  • Add debug output to the threads' event loops, to get a better idea of what they're doing.
  • Add exception handling (for example, a try-finally section) around all thread functions, also with debug output. This way, if a thread dies, you'll know.
  • Add a signal handler that would print out the stack traces of all active threads, and use it inspect the state of the program after a malfunction. You can find some useful code here.
Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012