2

I am iterating a list of links for screen scraping. The pages have JavaScript so I use Selenium. I have a defined a function to get the source for each page.

  1. Should I instantiate the WebDriver inside that function, which will happen once per loop?

  2. Or should I instantiate outside the function and pass the WebDriver in?

  3. Or assign the WebDriver to a variable that will be visible from inside the function, without explicitly passing it?

Louis
  • 146,715
  • 28
  • 274
  • 320
user2014160
  • 193
  • 1
  • 5
  • Your original question was not appropriate for SO. You were asking about libraries, in general. What you should or can do is going to vary from library to library and depending on what you are trying to do. In its original form you question was both too broad and opinion-based, which would have resulted in closure. – Louis Feb 14 '15 at 11:47

1 Answers1

1

Each instantiation of WebDriver launches a new browser, which is a very costly operation, so option 1 is not what you want to do.

I would also not do option 3 because it is not good coding practice to depend on global variables when it can easily be avoided.

This leaves you option 2: instantiate WebDriver once and pass the instance to your function(s).

Louis
  • 146,715
  • 28
  • 274
  • 320
  • Thanks for the edit Louis. So option 2 would look something like this: ``` def func(url,d): d.get(url) src = d.page_source # do stuff return(result) driver = webdriver.Chrome() for i in list: func(i, driver) ``` – user2014160 Feb 14 '15 at 11:47
  • Thanks for the edit Louis. So option 2 would look something like this: def func(url,d): d.get(url) src = d.page_source # do stuff return(result) driver = webdriver.Chrome() for i in list: func(i, driver) – user2014160 Feb 14 '15 at 11:54
  • Hard to read python code in a comment but it looks okay to me. If I read this well you instantiate before the loop and then the loop calls `func` and passes the driver as the 2nd argument. If I read this right, then what you suggest is correct. – Louis Feb 14 '15 at 11:55
  • Yes, that's the idea. Thanks for the direction on this. – user2014160 Feb 14 '15 at 11:59
  • By the way, using `page_source` may not work the way you want. One of [my answers](http://stackoverflow.com/a/27636548/1906307) deals with the issue. – Louis Feb 14 '15 at 11:59