0

Trying to save scraped weather data into mysql database using python but given an error because of the degree symbol, anyone know how to get this to work?

My code is;

import urllib2
import MySQLdb

from BeautifulSoup import BeautifulSoup
db = MySQLdb.connect("127.0.0.1","root","","weathersystem")
cursor = db.cursor()
sql = "TRUNCATE TABLE AMSTERDAM "
cursor.execute(sql)
db.commit()
db.close
soup = BeautifulSoup(urllib2.urlopen('http://weather.uk.msn.com/tenday.aspx?       wealocations=wc:NLXX0002').read())

for row in soup('div', {'class': 'weadetailed'})[0]('li'):
    tds = row('div')
    print tds[2].text, tds[3].text, (tds[6].span.text), tds[7].span.text, tds[8].text, tds[9].text
    cursor = db.cursor()
    sql = "INSERT INTO AMSTERDAM(DAY, DATE, HIGH, LOW, WIND, HUMIDITY) VALUES (%s,%s,%s,%s,%s,%s)"
    results = (str(tds[2].text), str(tds[3].text), str(tds[6].span.text),
           str(tds[7].span.text), str(tds[8].text), str(tds[9].text))
    cursor.execute(sql, results)
    db.commit()
    db.rollback()
    db.close()

And then i am given this error,

Traceback (most recent call last): Today 14 Feb 9° 5° Wind 18 mph SW Humidity 74% File "C:/Users/owner/PycharmProjects/WeatherStation/Innovation Scraper.py", line 18, in results = (str(tds[2].text), str(tds[3].text), str(tds[6].span.text), UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 1: ordinal not in range(128)

Thomas
  • 1,199
  • 3
  • 13
  • 25

2 Answers2

2

The traceback indicates either BeautifulSoup or the Python installation is complaining. Take a look at their documentation:

If you're getting errors that say: "'ascii' codec can't encode character 'x' in position y: ordinal not in range(128)", the problem is probably with your Python installation rather than with Beautiful Soup. Try printing out the non-ASCII characters without running them through Beautiful Soup and you should have the same problem. For instance, try running these three lines of code like this:

>>> latin1word = 'Sacr\xe9 bleu!'
>>> unicodeword = unicode(latin1word, 'latin-1')
>>> print unicodeword
Sacré bleu!

(Note that this should be in the interactive interpreter, not in the script. In the script, you'll still get that error if you stick it at the bottom.)

If that works (i.e. you see the last line returned), the problem is in BeautifulSoup and yes, you should upgrade to bs4. If that print line spits out a traceback, the problem is in your installation of Python. Instructions to work around it can be found at that link from above.

On another note, MySQLdb uses, by default, a latin1 character set. Unless you include the kwarg charset='utf8', you won't be able to insert that Unicode data into a table:

db = MySQLdb.connect("127.0.0.1","root","","weathersystem", charset="utf8")
pswaminathan
  • 8,734
  • 1
  • 20
  • 27
  • I added the kwarg but for some reason it still gave me the same error. – Thomas Feb 14 '14 at 01:50
  • Whoops, I misread the traceback. Try `latin1word = 'Sacr\xe9 bleu!'` `unicodeword = unicode(latin1word, 'latin-1')` `print unicodeword` in your interpreter. If that gives you another `UnicodeError`, the problem is in your installation of Python. If that is fine, it's a problem in the installation of BeautifulSoup. Can I ask why you're on BS3 instead of BS4? – pswaminathan Feb 14 '14 at 01:55
  • If your Python installation also printed the proper unicode characters, then the problem is with BeautifulSoup; you should upgrade to bs4. While you might not be able to follow the tutorial you're seeing to a T, you should still be able to use the same basic patterns. – pswaminathan Feb 14 '14 at 02:18
  • I installed BS4 but i place from BS4 import BeautifulSoup it says no module named bs4 – Thomas Feb 14 '14 at 02:22
  • I reinstalled beautiful soup and all packages were installed successfully, is there anyway i could get this working? I have tried to scrape the website for the Temperature number only but because its together it wont separate the two. – Thomas Feb 14 '14 at 02:44
  • You should look at string slicing (http://stackoverflow.com/questions/663171/is-there-a-way-to-substring-a-string-in-python), and alternatively, the str.replace method (http://docs.python.org/2/library/stdtypes.html#string-methods). For instance, `temperature = u"70\u00B0"` `print temperature[:-1]` – pswaminathan Feb 14 '14 at 02:53
  • Thank you for your help i had a look at slicing and have learnt a useful tip there, however it wouldnt work on my scraper as if i was to only scrape [:1] it would not show the double figured temperature and if i had scraped [:2] for the single digit numbers it would still show the degree symbol however i was able to use .encode('utf8') to take the degree symbol away when saving into the database. – Thomas Feb 14 '14 at 03:21
0

Was able to do it by adding .encode('utf8').

for example

results = (str(tds[2].text), str(tds[3].text), str(tds[6].span.text.encode('utf8')),
       str(tds[7].span.text.encode('utf8')), str(tds[8].text), str(tds[9].text))
Thomas
  • 1,199
  • 3
  • 13
  • 25