2

I'm trying to make actions with Python requests. Here is my code:

import threading
import resource
import time
import sys

#maximum Open File Limit for thread limiter.
maxOpenFileLimit = resource.getrlimit(resource.RLIMIT_NOFILE)[0] # For example, it shows 50.

# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)

def threadAction(a1, a2):
    global number
    time.sleep(1) # My actions with Requests for each thread.
    print number = number + 1

number = 0 # Count of complete actions

ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
    a1 = i
    for n in range(10): # Every website I need to do in 3 threads
        a2 = n
        ThreadActions.append(threading.Thread(target=threadAction, args=(a1,a2)))


for item in ThreadActions:
    # But I can't do more than 50 Threads at once, because of maxOpenFileLimit.
    while True:
        # Thread limiter, analogue of BoundedSemaphore.
        if (int(threading.activeCount()) < threadLimiter):
            item.start()
            break
        else:
            continue

for item in ThreadActions:
    item.join()

But the thing is that after I get 50 Threads up, the Thread limiter starting to wait for some Thread to finish its work. And here is the problem. After scrit went to the Limiter, lsof -i|grep python|wc -l is showing much less than 50 active connections. But before Limiter it has showed all the <= 50 processes. Why is this happening? Or should I use requests.close() instead of requests.session() to prevent it using already oppened sockets?

passwd
  • 2,883
  • 3
  • 12
  • 22
  • Your thread limiter goes into a tight loop and eats up most of your processing time. Try something like `sleep(.1)` to slow it down. Better yet, use a Queue limited to 50 requests and have your threads read those. – tdelaney Oct 01 '16 at 16:05
  • On increasing the limits in the OS for your user look for [ulimit](http://stackoverflow.com/questions/6774724/why-python-has-limit-for-count-of-file-handles) and [fs.file-max](https://cs.uwaterloo.ca/~brecht/servers/openfiles.html). After doing that, on increasing the limit from inside python, look for [setrlimit](https://coderwall.com/p/ptq7rw/increase-open-files-limit-and-drop-privileges-in-python). Of course, make sure that you are not running busy-while-loops needlessly and have properly multiplex your code. – blackpen Oct 01 '16 at 16:36
  • Yes, I understand and in the real script I use BoundedSemaphore. But why is the `lsof -i|grep python|wc -l` show much lower number right after the script come to the limit? – passwd Oct 02 '16 at 10:26

1 Answers1

1

Your limiter is a tight loop that takes up most of your processing time. Use a thread pool to limit the number of workers instead.

import multiprocessing.pool

# Will use one session for every Thread.
requestSessions = requests.Session()
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status.
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100))
requestSessions.mount('http://', adapter)
requestSessions.mount('https://', adapter)

def threadAction(a1, a2):
    global number
    time.sleep(1) # My actions with Requests for each thread.
    print number = number + 1 # DEBUG: This doesn't update number and wouldn't be
                              # thread safe if it did

number = 0 # Count of complete actions

pool = multiprocessing.pool.ThreadPool(50, chunksize=1)

ThreadActions = [] # Action tasks.
for i in range(50): # I have 50 websites I need to do in parallel threads.
    a1 = i
    for n in range(10): # Every website I need to do in 3 threads
        a2 = n
        ThreadActions.append((a1,a2))

pool.map(ThreadActons)
pool.close()
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Do multiprocessing work faster than Threading? How this affect on the processor load? – passwd Oct 02 '16 at 12:06
  • Its a trade-off ... and different on windows than linux. With multiprocessing, the data needs to serialized between parent and child (and on windows typically more context needs to be serialized because the child doesn't get a clone of the parent memory space) but then you don't worry about single threading through the GIL. Higher CPU and/or lower data overhead make multiprocessing good. But if you are mostly I/O bound, thread pools are fine. – tdelaney Oct 02 '16 at 17:38