0

I'm trying to extract a keyword/string from a website's source code using this python 2.7 script:

from selenium import webdriver

keyword = ['googleadservices']

driver = webdriver.Chrome(executable_path=r'C:\Users\Jacob\PycharmProjects\Testing\chromedriver_win32\chromedriver.exe')
driver.get('https://www.vacatures.nl/')

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

for searchstring in keyword:
    if searchstring.lower() in str(source_code).lower():
        print (searchstring, 'found')
    else:
        print (searchstring, 'not found')

The browser fortunately opens when the script is running, but I'm not able to extract the desired keywords from it's source code. Any help?

jakeT888
  • 123
  • 2
  • 17

2 Answers2

0

I observed that googleadservices is NOT present in the web page source code.

There is NO issue with the code.

I tried with GoogleAnalyticsObject, and it is found.

from selenium import webdriver

keyword = ['googleadservices', 'GoogleAnalyticsObject']

driver = webdriver.Chrome()
driver.get('https://www.vacatures.nl/')

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

for searchstring in keyword:
    if searchstring.lower() in str(source_code).lower():
        print (searchstring, 'found')
    else:
        print (searchstring, 'not found')

Instead of using //* to find the source code

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

Use the following code:

source_code = driver.page_source
Naveen Kumar R B
  • 6,248
  • 5
  • 32
  • 65
0

As others have said, the issue isn't your code but simply that googleadservice isn't present in the source code.

What I want to add, is that your code is a bit over engineered, since all you seem to do is either return true or false if a certain string is present in the source code.

You can achieve that much easier with a better xpath like //script[contains(text(),'googletagmanager')] and than use find_element_by_xpath and catch the possible NoSuchElementException. That might save you time and you don't need the for loop.
There are other possiblities as well, using ExpectedConditions or find_elements_by_xpath and then check if the returned list is greater than 0.

Robert G
  • 928
  • 7
  • 9