3

I'm using the following code to change the user-agent string, but I'm wondering whether or not this will change the user-agent string for each and every browser.get request?

ua_strings = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.1 Safari/605.1.15',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    ...
]

def parse(self, response):
    profile = webdriver.FirefoxProfile()
    profile.set_preference('general.useragent.override', random.choice(ua_string))
    options = Options()
    options.add_argument('-headless')
    browser = webdriver.Firefox(profile, firefox_options=options)
    browser.get(self.start_urls[0])

    hrefs = WebDriverWait(browser, 60).until(
        EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="discoverableCard"]/a'))
    )

    pages = []

    for href in hrefs:
        pages.append(href.get_attribute('href'))

    for page in pages:
        browser.get(page)

        """ scrape page """

    browser.close()

Or will I have to browser.close() and then create new instances of browser in order to use new user-agent strings for each request?

    for page in pages:
        browser = webdriver.Firefox(profile, firefox_options=options)
        browser.get(page)

        """ scrape page """

        browser.close()
oldboy
  • 5,729
  • 6
  • 38
  • 86

1 Answers1

2

Since random.choice() has been called initially, the user-agent string remains the same of all browser.get() requests. To ensure a constantly random user-agent, you can create a set_preference() function, which you call on every loop.

def set_prefrences(self):
    user_agent_string = random.choice(ua_string)

    #print out user-agent on each loop
    print(user_agent_string)
    profile = webdriver.FirefoxProfile()
    profile.set_preference('general.useragent.override', user_agent_string)
    options = Options()
    options.add_argument('-headless')
    browser = webdriver.Firefox(profile, firefox_options=options)
    return browser

Then in your loop can be something like this:

for page in pages:
    browser = set_preferences()
    browser.get(page)

    """ scrape page """

    browser.close()

Hope this helps!

Erisan Olasheni
  • 2,395
  • 17
  • 20
  • Okay, that's what I figured. Can you provide any documentation or other info that verifies this? If you can provide that information, I'll mark this as the answer. I'm just being precautious since your points are so low. Or how can I print out the user-agent string for each request so that I can verify this myself?! Sorry, I'm a super n00b when it comes to Selenium. – oldboy Jul 11 '18 at 03:50
  • @Anthony, I modified the code to display the random user-agent on each loop! – Erisan Olasheni Jul 11 '18 at 04:04
  • ahhh right no shit. god im such a noob. im just gon run my script and if it pans out we r good to go!! – oldboy Jul 11 '18 at 04:04
  • ive read that its not necessarily best practice to be opening and closing instances of the webdriver so often. are there any other (better) ways to do this? – oldboy Jul 11 '18 at 04:24
  • hm... now that i think about it. that way of printing doesn't necessarily verify that the the UA string has been changed in the headers. is there any way to print the actual headers that are sent with each `get` request?!?! – oldboy Jul 11 '18 at 16:59
  • well... it is a straight forward code, if you read it, you will understand it... If any reason should affect it from not changing to the random ua_strings, then that should be issue with the web driver itself, not your code. The above code is a Pythonic way to ensure the random user-agent strings are being used on every loop. Thanks. – Erisan Olasheni Jul 11 '18 at 18:18
  • yaya i know the script is choosing a random ua string on every iteration, but i want to test whether or not and ensure the driver is actually swapping them. any idea how i can do this? – oldboy Jul 11 '18 at 20:03
  • also, if youre knowledgeable when it comes to Scrapy and Selenium, would you mind taking a look at my other question?? [Using and Randomizing Proxies](https://stackoverflow.com/questions/51276893/using-and-randomizing-proxies) – oldboy Jul 11 '18 at 20:07
  • u can do it just using javascript (i.e. `return navigator.userAgent`), which didnt even cross my mind for some reaosn. it's working :) – oldboy Jul 13 '18 at 04:37