1

I have a very basic python script that pulls from a text file of searches and returns the first URL from Google. I'm receiving an error when the google result contains a foreign character (such as montréal)

Ideally I'd like to include any character pulled regardless of language

import requests                   
from bs4 import BeautifulSoup

with open("searches.txt") as input:  # look at each line in our input file
    content = input.readlines()
content = [x.strip() for x in content]  # and strip of newline characters

print '---'  # some formatting so it looks nice in terminal and our output file
header = '<Query>, <Link>' + '\n' + '---------------' + '\n' 
output = open("links.txt", "w")  # open file we want to write to                                 
output.write(header)                                            

for x in content:  # for each line in our input file
    print x
    query = x  # search google for that query
    goog_search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + query
    r = requests.get(goog_search)                                                                                                           
    soup = BeautifulSoup(r.text, "html.parser")  # parse so we just get the link
    link = soup.find('cite').text
    formatted = query + ', ' + link + '\n'  # more output formatting
    print query + ', ' + link
    output.write(formatted)

output.close()
print '---'

error I'm receiving: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 53: ordinal not in range(128)

L3viathan
  • 26,748
  • 2
  • 58
  • 81
j.kaplan
  • 11
  • 1
  • Is there a specific reason you're using Python 2.7 and not Python 3? – L3viathan Mar 03 '17 at 18:10
  • See a similar question here: http://stackoverflow.com/questions/19833440/unicodeencodeerror-ascii-codec-cant-encode-character-u-xe9-in-position-7. Basically, when you open a file, open with explicit utf-8 encoding, and when you write, do the same – Jose Haro Peralta Mar 03 '17 at 20:32
  • @L3viathan I'm very new to python and my buddy just suggested I start with 2.7 – j.kaplan Mar 04 '17 at 19:54
  • @j.kaplan I suggest otherwise, especially when you're doing things with text. Python 3 comes with Unicode strings by default, in most cases you won't have to worry about encodings anymore. – L3viathan Mar 04 '17 at 21:16
  • @L3viathan I did not know that, thanks for that! That sounds like it would solve my issue right away. Do you know what the script would be in python 3? Like I said, I'm very new to python – j.kaplan Mar 06 '17 at 03:29
  • Exactly the same, except for the lines with print: print is now a function, i.e. e.g. `print(query + ', ' + link)`. – L3viathan Mar 06 '17 at 05:51

0 Answers0