2

I am trying to use a global list which can be appended when a thread/process finishes a task. My main thread can read from this but by function can not append it. Basically im making requests to get working proxies and then trying to save them to the list and then print the list out at the end. I have cut out as much as possible.

goodProxyList = ["test"] 


def testProxy(x):
    global goodProxyList
    try:
        test = requests.get('http://someurl.com/', proxies=proxies, timeout=10)
        if test.status_code == 200:
            goodProxyList.append(x)
        else:
            print("Something went wrong! :/" + " From PID: " + str(pid))
    except:
        print("SOMETHING WENT VERY WRONG" + " From PID: " + str(pid))


if __name__ == '__main__':
    ##Setup Stuff happens
    p=Pool(2)
    p.map(testProxy, proxyList)
    for i in goodProxyList:
        print(i)

Even if I change goodProxyList.append(x) to goodProxyList.append("Anything"), the last 2 lines still onlt output "test". What am I doing wrong?

EDIT:

I have found the answer through help from brianpck. As he says, it seems processes work differently from threads. My changing to a pool thread it now works perfectly.

#p=Pool(2)
#p.map(testProxy, proxyList)
with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(testProxy, proxyList)
  • What exactly do you mean, "cannot append to it"? This should work fine (and in fact it is not even necessary to declare `global goodProxyList` explicitly) – brianpck Nov 11 '16 at 16:06
  • Sorry, I should have said that the output the the last 2 lines is still just "test" and nothing else. – user3406647 Nov 11 '16 at 16:28
  • @brianpck to append to a variable/list outside function scope you have to declare `global goodProxyList`. – r0xette Nov 11 '16 at 16:28
  • 1
    @r0xette Not for a `list`: Try `x=[]`, `def f(): x.append(1)`, `f()`: `x == [1]` outside function scope – brianpck Nov 11 '16 at 16:33
  • What happens with `Pool()`? – dawg Nov 11 '16 at 16:36
  • @dawg its creating 2 processes/thread and then running testProxy() for check element of proxyList on the different processes – user3406647 Nov 11 '16 at 16:44

1 Answers1

1

The issue here is with Pool, not with global.

When appending to a list (a mutable object) in function scope, the list will be mutated in the global scope as well. (In fact, you don't even have to use the global keyword: if the function doesn't find the variable in its own scope, it will automatically look in the global scope.) Note one small "gotcha" in the below code, because map is a generator-like object:

x = []

def add_to_x(i):
    x.append(i)

if __name__ == '__main__':
    y = map(add_to_x, [1, 2])
    print(x) # still []
    list(y)
    print(x) # now [1, 2]

The following simple example with Pool does not work though:

from multiprocessing import Pool

x = []

def add_to_x(i):
    x.append(i)

if __name__ == '__main__':
    p = Pool(2)
    list(p.map(add_to_x, [1, 2]))
    print(x) # prints [] !

Why? The answer to Python multiprocessing global variable updates not returned to parent is illuminative: here is the relevant part:

When you use multiprocessing to open a second process, an entirely new instance of Python, with its own global state, is created. That global state is not shared, so changes made by child processes to global variables will be invisible to the parent process.

You could potentially deal with this in many ways. One way would be to change testProxy to is_good_proxy, which will return a boolean. You could then apply the appending logic in the main loop.

Community
  • 1
  • 1
brianpck
  • 8,084
  • 1
  • 22
  • 33
  • Ahh very interesting. If you can't tell its my first time using any kind of multithreading/processing. Could this be resolved (and is it possible?) to use a pool of threads and not separate processes? – user3406647 Nov 11 '16 at 16:56
  • I'm not sure: I haven't had occasion to use `multiprocessing` myself much either. It certainly doesn't seem correct, though, to use `map` if your end goal is appending to a list you already have. – brianpck Nov 11 '16 at 16:59
  • 2
    I have solved the problem I think. It seems processes have their own state of global etc but threads do not? I changed to a pool thread by adding these lines and it now works: #p=Pool(2) #p.map(testProxy, proxyList) with concurrent.futures.ThreadPoolExecutor() as executor: executor.map(testProxy, proxyList) – user3406647 Nov 11 '16 at 17:04
  • @user3406647 thank you. – Eftekhari Jan 27 '22 at 20:29