Retrieve a list of URLs from google and print to text file in python

Question

Ok, so I've got some code I can't get to work. I'm trying to pull the first 10 urls from google for any given search result. I cannot get the results to print to the file, have had errors such as:

AtributeError: 'NoneType' object has no attribute

from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os

jobid='test_01'
town='town'
KEIF="/home/dream/scripts/Keif/"

f = open(KEIF+'wordlists/trades.txt')
job = f.readline()

DIR="/home/dream/scripts/Keif/output/"+jobid
if not os.path.exists(DIR):
        os.mkdir(DIR)

while job:

#Create google query

    jobquery=job.replace(" ", "%20"); jobquery=jobquery.replace("/n ", "")
    query = jobquery+'%20'+'in%20'+town
    query= query.split()
    query='+'.join(query)

 #Create new text file based on query

    jobdir=job.replace(" ","_"); jobdir=jobdir.replace("\n", "")
    textpath=DIR+"/"+jobdir+'.txt'
    os.system("cat "+textpath)
    textOut=open(textpath, "a")

 #Get Results

    page = requests.get("https://www.google.co.uk/search?q="+query+"&start=10&num=10")
    soup = BeautifulSoup(page.text, "html.parser")
    print soup.find('cite').text

I am getting the:

AtributeError: 'NoneType' object has no atribute

on the last line print sou.find('cite').text.

Can someone please help keep me from tearing out my hair!

Please provide more info on your error. Where does your error occur? — RagingRoosevelt, Dec 13 '17 at 20:14
I am getting the AtributeError: 'NoneType' object has no atribute. on the last line @ print sou.find('cite').text. — Nemo, Dec 13 '17 at 20:26
Google uses JavaScript do display data but BeautifulSoup doesn't run JavaScript. Google can send page which doesn't use JavaScript and then elements are in different tags. Turn off JavaScript in web browser and open Google to see what BeautifulSoup can get. — furas, Dec 13 '17 at 20:35
BTW: Google can also recognize that you run script and then it can send Captcha or warning message instead of expected data - you can save `page.text` in file `.html` and open in web browser to see what you get. — furas, Dec 13 '17 at 20:39
Is there any way around this at all? Maybe it is against their terms of service? — Nemo, Dec 13 '17 at 20:56
scraping is always against terms of service :) First check what you get in `page.text` - if you get warning then you can use [Selenium](http://selenium-python.readthedocs.io/) which lets you control web browser so it will look more like real human. — furas, Dec 13 '17 at 21:12

Retrieve a list of URLs from google and print to text file in python

0 Answers0