0

I am trying to scrape google news using the following code:

from bs4 import BeautifulSoup
import requests
import time
from random import randint


def scrape_news_summaries(s):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
    content = r.text
    news_summaries = []
    soup = BeautifulSoup(content, "html.parser")
    st_divs = soup.findAll("div", {"class": "st"})
    for st_div in st_divs:
        news_summaries.append(st_div.text)
    return news_summaries


l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
    print(n)

Even though this bit of code was working before, I now can't figure out why it's not working anymore. Is it possible that I've been banned by google since I only ran the code 3 or four times? (I tried using Bing News with unfortunate empty results too...)

Thanks.

ylnor
  • 4,531
  • 2
  • 22
  • 39
  • Answer to the related question about scraping Google News by using `requests`: https://stackoverflow.com/a/15552114/1291371 – ilyazub Apr 03 '20 at 16:13

1 Answers1

2

I tried running the code and it works fine on my computer.

You could try printing the status code for the request, and see if it's anything other than 200.

from bs4 import BeautifulSoup
import requests
import time
from random import randint


def scrape_news_summaries(s):
    time.sleep(randint(0, 2))  # relax and don't let google be angry
    r = requests.get("http://www.google.co.uk/search?q="+s+"&tbm=nws")
    print(r.status_code)  # Print the status code
    content = r.text
    news_summaries = []
    soup = BeautifulSoup(content, "html.parser")
    st_divs = soup.findAll("div", {"class": "st"})
    for st_div in st_divs:
        news_summaries.append(st_div.text)
    return news_summaries


l = scrape_news_summaries("T-Notes")
#l = scrape_news_summaries("""T-Notes""")
for n in l:
    print(n)

https://www.scrapehero.com/how-to-prevent-getting-blacklisted-while-scraping/ for a list of status code that's a sign you have been banned.

Andreas
  • 1,091
  • 1
  • 11
  • 16
  • Thanks. It now runs again on my computer. I guess I was banned for a while. Thanks, I'll catch the non-200 responses to avoid further issues... – ylnor Sep 06 '16 at 17:44
  • How would I include a search for any statement such as ("T-Notes" OR "Notes") AND ("Albania" OR "Romania")? – amc Apr 09 '17 at 00:20