How to extract the text from Google features in Python?

Question

By Google features I mean for example when you type in to Google "I'm feeling curious" and the first result is a random fact, after that you get the basic results. What I'm trying to do is to extract the random fact's text in Python. I tried using libraries requests and bs4. I noticed that the random fact feature can't be found with requests library.

Is there some other way to extract the text?

Oleksandr Makarenko · Accepted Answer · 2018-07-12T07:30:40.133

0

The text could be extracted via UI with Selenium WebDriver and Python. But, selectors won't be stable due to changed classes name with every page loading. For example, xpath to get text of the question will be like //*[@id="rso"]/div/div/div/div/div/div/div/div/div[1]/div.

BTW, it's possible. Look at the example below:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_experimental_option("prefs", {"profile.default_content_setting_values.notifications": 2})
browser = webdriver.Chrome(chrome_options=chrome_options)

browser.get("https://www.google.com")
search_box= browser.find_element_by_id("lst-ib")
search_box.send_keys("I'm feeling curious")
search_box.submit()
wait = WebDriverWait(browser, 5)
question = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div[1]/div')))
answer = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div[2]/div')))
from time import sleep
count = 3
while not answer.text:
    if not count: break
    sleep(1)
    answer = browser.find_element_by_xpath('//*[@id="rso"]/div/div/div/div/div/div/div/div/div[2]/div')
url = wait.until(EC.presence_of_element_located((By.XPATH, '//*[@id="rso"]/div/div/div/div/div/div/div/div/div/p/a'))).get_attribute('href')

print('Question: {} \nAnswer: {}\nUrl: {}'.format(question.text, answer.text, url))

You can run this code if you install Selenium, and others dependencies if will be needed.

edited Jul 12 '18 at 07:30

answered Jul 11 '18 at 08:33

Oleksandr Makarenko

779
1
6
18

I don't see any results for Answer but Question and URL are working just fine. Is there a way to do this process invisibly? If not, how can I close the opened Chrome afterwards? – Lolman Jul 11 '18 at 15:09
I have added waiter for the question, because it is not loaded at once. Check, please. Yes, this process can be invisible, one moment. – Oleksandr Makarenko Jul 11 '18 at 15:15
I have added `headless` option to browser settings. Now, the browser won't launch user interface. Try to run the code, please. – Oleksandr Makarenko Jul 11 '18 at 15:25
I noticed that the chromedriver.exe opens console every time it runs, is there a way to run it without visible console either? – Lolman Jul 11 '18 at 18:00
Console is not opened on my mac. Are you running on Windows? I have added a new option for chrome `chrome_options.add_argument("--disable-gpu")`. 'Temporarily needed if running on Windows.' It may help you. – Oleksandr Makarenko Jul 12 '18 at 07:36
Yes, I am running on Windows 10. The argument you suggested did not help. I get this message when I run it, with or without `chrome_options.add_argument("--disable-gpu")`: [message](https://puu.sh/AV7VM/15ed7a4df0.png) – Lolman Jul 12 '18 at 12:42
Are you using latest `chromedriver.exe` and `Chrome` browser? I don't know any other reasons for that error appearing. – Oleksandr Makarenko Jul 12 '18 at 13:03
After reading [this](https://stackoverflow.com/questions/50143413/errorgpu-process-transport-factory-cc1007-lost-ui-shared-context-while-ini) I came in to conclusion that I just need to wait for a new chromedriver.exe version, where this might be fixed. I have currently the latest versions of Chrome and chromedriver.exe – Lolman Jul 12 '18 at 13:46
Yes, or run code on another OS. For example, Linux (Ubuntu). – Oleksandr Makarenko Jul 12 '18 at 14:57

How to extract the text from Google features in Python?

1 Answers1