I am trying to do the following.
I have 8 cores.
I execute 8 process as follows where core_aa is a filename that loads urls into a queue
python threaded_crawl.py core_aa --max_async_count=20 --use_headers --verbose > /tmp/core_aa.out python threaded_crawl.py core_ab --max_async_count=20 --use_headers --verbose > /tmp/core_ab.out python threaded_crawl.py core_ac --max_async_count=20 --use_headers --verbose > /tmp/core_ac.out python threaded_crawl.py core_ad --max_async_count=20 --use_headers --verbose > /tmp/core_ad.out python threaded_crawl.py core_ae --max_async_count=20 --use_headers --verbose > /tmp/core_ae.out python threaded_crawl.py core_af --max_async_count=20 --use_headers --verbose > /tmp/core_af.out python threaded_crawl.py core_ag --max_async_count=20 --use_headers --verbose > /tmp/core_ag.out python threaded_crawl.py core_ah --max_async_count=20 --use_headers --verbose > /tmp/core_ah.out
Each if the proccess is a threaded app that runs 20 threads whose job is to fetch a url. If I have e.g. 60K urls and I run one proccess the job completes with all threads living till the queue is empty
If I run more than one process I notice that the threads start to die slowly e.g. one death per 1000. Idea os to split the 60K for one proccess to 8. total nummber of threads is 20*8
Each of the process share no data.
So given that one job works of one, why would executing multiple process kill threads?
How can I fix?
class ThreadClass(threading.Thread):
def __init__(self,parms={},proxy_list=[],user_agent_list=[],use_cookies=True,fn=None,verbose=False):
threading.Thread.__init__(self)
def run(self):
while page_queue.qsize()>0:
FETCH URLS....
for page in xrange(THREAD_LIMIT):
tc = ThreadClass(parms=parms,proxy_list=proxy_list,user_agent_list=user_agent_list,use_cookies=use_cookies,fn=fn,verbose=verbose)
tc.start()
while threading.activeCount()>=THREAD_LIMIT:
time.sleep(1)
while threading.activeCount()>1:
time.sleep(1)
I have do idea how to debug and there is no error. Given that I have the following condition,
while threading.activeCount()>1:
time.sleep(1)
Once the threads are all dead, the code continues even though there are items left in the queue when the threads should run until the queue is empty.
very confused.
Once the active count