Unable to fetch article result from google search

Question

Im trying to read this link content via beautifulsoup and then trying to fetch article dates present in span.f

import requests
import json
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'}
from selenium import webdriver
link="https://www.google.com/search?q=replican+party+announced&ie=utf-8&oe=utf-8&client=firefox-b"
browser=webdriver.Firefox()
browser.get(link)
s=requests.get(link)
soup5 =BeautifulSoup(s.content,'html.parser')

Now i want to fetch all the article dates present in <span class="f">Apr 27, 2018 - </span> along with their corresponding "link URL" But this code aint fetching anything for me

for i in soup5.find_all("div",{"class":"g"}):
    print (i.find_all("span",{"class":"f"}))

Zilong Li · Answer 1 · 2018-05-23T08:39:19.223

You don't need selenium for this task. Use BeautifulSoup's .select() method as below:

import requests
from bs4 import BeautifulSoup
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'}

link = "https://www.google.com/search?q=replican+party+announced&ie=utf-8&oe=utf-8&client=firefox-b"

r = requests.get(link, headers=headers, timeout=4)

encoding = r.encoding if 'charset' in r.headers.get('content-type','').lower() else None

soup = BeautifulSoup(r.content, 'html.parser', from_encoding=encoding)

for d in soup.select("div.s > div"):
    # check if date exists
    if d.select("span.st > span.f"):
        date = d.select("span.st > span.f")
        link = d.select("div.f > cite")
        print(date[0].text)
        print(link[0].text)

Output:

2018. 4. 27. - 
https://www.cnn.com/2017/11/10/politics/house.../index.html
2018. 3. 19. - 
thehill.com/.../379087-former-gop-lawmaker-announces-hes-leav...
2018. 4. 11. - 
https://www.nytimes.com/2018/04/11/us/.../paul-ryan-speaker.htm...
2017. 10. 24. - 
https://www.theguardian.com/.../jeff-flake-retire-republican-senat...

Thanks that was awesome !!! but Id also like to print the corresponding "link url" in-front of each of these dates. Pls suggest how to do that too — vinita, May 23 '18 at 08:27

undetected Selenium · Accepted Answer · 2018-05-23T08:57:39.623

As you are using Selenium so instead of using requests you can easily take out the page_source through BeautifulSoup and invoke find_all() and print the dates as follows :

Code Block :

from bs4 import BeautifulSoup as soup
from selenium import webdriver
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'}
link="https://www.google.com/search?q=replican+party+announced&ie=utf-8&oe=utf-8&client=firefox-b"
browser = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
browser.get(link)
soup5 = soup(browser.page_source,'html.parser')
print("Dates are as follows : ")
for i in soup5.find_all("span",{"class":"f"}):
    print (i.text)
print("Link URLs are as follows : ")
for i in soup5.find_all("cite",{"class":"iUh30"}):
    print (i.text)

Console Output :

Dates are as follows : 
Mar 19, 2018 - 
Apr 27, 2018 - 
Feb 1, 2018 - 
Apr 17, 2018 - 
Jan 9, 2018 - 
Link URLs are as follows : 
thehill.com/.../379087-former-gop-lawmaker-announces-hes-leaving-gop-tears-into-tr...
https://edition.cnn.com/2017/11/10/politics/house-retirement-tracker/index.html
https://en.wikipedia.org/wiki/Republican_Party_presidential_candidates,_2016
https://www.cbsnews.com/.../joe-scarborough-announces-hes-leaving...

Update

Incase you want to print the Dates and Link URLs side by side you can use :

Code Block :

from bs4 import BeautifulSoup as soup
from selenium import webdriver
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36'}
link="https://www.google.com/search?q=replican+party+announced&ie=utf-8&oe=utf-8&client=firefox-b"
browser = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
browser.get(link)
soup5 = soup(browser.page_source,'html.parser')
for i,j in zip(soup5.find_all("span",{"class":"f"}), soup5.find_all("cite",{"class":"iUh30"})):
    print(i.text, j.text)

Console Output :

Mar 19, 2018 -  thehill.com/.../379087-former-gop-lawmaker-announces-hes-leaving-gop-tears-into-tr...
Apr 27, 2018 -  https://edition.cnn.com/2017/11/10/politics/house-retirement-tracker/index.html
Feb 1, 2018 -  https://en.wikipedia.org/wiki/Republican_Party_presidential_candidates,_2016
Apr 17, 2018 -  https://www.cbsnews.com/.../joe-scarborough-announces-hes-leaving...
Jan 9, 2018 -  www.travisgop.com/2018_precinct_conventions

Thanks but Id also like to print the corresponding "link url" in-front of each of these dates. Any idea as to how to do it — vinita, May 23 '18 at 08:31
@vinita Checkout my updated answer and let me know the result — undetected Selenium, May 23 '18 at 08:42
Thanks :-) , just one last help. I was trying to print the link text by updating your code as(pls tell me if its fine, or is there any other better alternative) Im using split on "-" to remove the date:-- for i,j,k in zip(soup5.find_all("span",{"class":"f"}), soup5.find_all("cite",{"class":"iUh30"}),soup5.find_all("span",{"class":"st"}) ): print(i.text, j.text,k.text.split("-")[1:]) — vinita, May 23 '18 at 09:08
@vinita I am afraid :( as I am unable to exactly understand your requirement as in `print the link text` and `split on "-" to remove the date`. Can you raise a new question for your new requirement please? — undetected Selenium, May 23 '18 at 09:21
link text for first google search is -- "Former GOP Rep. Charles Djou (Hawaii) announced he is leaving the Republican Party." Im splitting it on "-", so as to remove the date "Mar 19, 2018". I dont want this date as Ive already printed this date via your previous code. — vinita, May 23 '18 at 09:24

Unable to fetch article result from google search

2 Answers2

Update

Linked