0

I know there are tons of questions like this one, I tried to read them all. What I'm doing is to use the multiprocessing library to parse web pages via Python Selenium. So, I have 3 lists to give to a function that processes them. First I write the function, then initiate the browser istance and lastly start the 3 processes.

import ...

def parsing_pages(list_with_pages_to_parse):
    global browser
    #do stuff

if __name__ == '__main__':
    browser = webdriver.Chrome(..., options = ...)
    browser.get(...)

    lists_with_pages_to_parse = [ [...], [...], [...] ]
    
    pool.mp.Pool(3)
    pool.map(parsing_pages, lists_with_pages_to_parse)
    pool.close
    pool.join

The error:

NameError: name 'browser' is not defined

Traceback (most recent call last):
  File "c:\Users\39338\Desktop\program.py", line 323, in <module>
    pool.map(parsing_pages, lists_with_pages_to_parse)
  File "C:\Users\39338\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\39338\AppData\Local\Programs\Python\Python310\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
NameError: name 'browser' is not defined

I used global to allow "browser" to be used inside the function. I thought the problem was that the function is written before I create "browser", but when I try to put it after the main part, I get the error that the function cannot be found when called.

Riccardo Lamera
  • 105
  • 1
  • 13

2 Answers2

1

First thing first: always try to avoid using the global keyword. It causes instability i code as it gets longer and complex.

Anyways, your code says global is not defined because you don't have the global variable named browser defined outside of the function scopes.

Remove the global keyword. You don't need it since you are returning browser to the function itself.

don't forget to check out these resources:

NameError: global name 'browser' is not defined

https://python-forum.io/thread-12073.html

https://githubhot.com/repo/Matrix07ksa/Brute_Force/issues/24

https://github.com/MasonStooksbury/Free-Games/issues/41

Emin
  • 49
  • 10
  • I solved it initializing the browser outside the if statement, keeping `global browser` though. I'll rewrite it trying to avoid using it. – Riccardo Lamera Apr 14 '22 at 12:11
1

Calling this function when __name__ != '__main__' (from another: file, thread or process) will never initialize browser. Example:

def f():
    global browser
    browser

if __name__ == '__main__':
    browser = None

# Calling f will not raise an error
f()
def f():
    global browser
    browser

if __name__ != '__main__':
    browser = None

# Calling f will will now raise an error
f()

I think what's happening is you are making a pool and the pool runs parsing_pages() from another process where __name__ != '__main__'.


You need to do one of the following:

  • Pass browser into your function as an argument
  • Initialize browser outside of the if statement

You should add print(__name__) to check what it equals. It will probably return the name of your file, rather than __main__.


Edit after problem was solved:

__name__ will equal '__main__' when you are running the file without: threads, processing pools or from another file. i.e. when you run it by itself. As this was running in a multiprocessing pool, it was not going to satisfy __name__ == '__main__'. So the conditional would never allow for browser to be initialized.

This is discussed in much more detail below:

A video for easy digestion (in Python2 but that's fine)

Python Tutorial: if __name__ == '__main__' (Youtube | 8:42)

Most detailed articles (Stack Overflow)

What does if __name__ == "__main__": do?

Purpose of 'if __name__ == "__main__":'

And if you're interested

What's the point of a main function and/or __name__ == "__main__" check in Python?

Freddy Mcloughlan
  • 4,129
  • 1
  • 13
  • 29
  • 1
    Initializing the browser outside the if statement did the trick, as to `print(__name__)` it returns `__mp_main__ `. I put it just below `global browser`. – Riccardo Lamera Apr 14 '22 at 12:07
  • Could you pls try to explain me why inside the if statement it doesn't work? – Riccardo Lamera Apr 14 '22 at 12:12
  • 1
    Awesome, just beware that `browser` may be re-initialized if you open another process of the same file. If this is the case, I recommend you open a new question asking to see what the best option in handling multiple processes with a single `browser` is (if there isn't already a question). – Freddy Mcloughlan Apr 14 '22 at 12:12
  • 1
    @RiccardoLamera Added – Freddy Mcloughlan Apr 14 '22 at 12:28