1

I have a python program wherein I access a url and extract data. I then input this data into a mysql table. The mysql table has columns pid ,position,club, points,s,availability, rating,name . I have no issues with the python program ( I hope) but the database apparently does not seem to accept names with UTF alphabets ex: Jääskeläinen . How do I make the database to accept these names? I tried using the answer given here. But the program still gives me the following error:

Traceback (most recent call last):
  File "C:\Users\GAMER\Desktop\Padai\Fall 13\ADB\player_extract.py", line 49, in <module>
    sql += "('{0}', '{1}', '{2}', '{3}', '{4}','{5}','{6}','{7}')".format(count,position,club, points,s,availability, rating,name)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
excepted Goalkeepers Jääskeläinen West Ham 67 £5.5

My python code is this"

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from urllib import urlopen
from pprint import pprint
import MySQLdb
import traceback
import re
#fetch players list from site

url = "http://fantasy.premierleague.com/player-list/"
html = urlopen(url).read()


soup = BeautifulSoup(html)
h2s = soup.select("h2")
tables = soup.select("table")

first = True
title =""
players = []
for i,table in enumerate(tables):
    if first:
         title =  h2s[int(i/2)].text
    for tr in table.select("tr"):
        player = (title,)
        for td in tr.select("td"):
            player = player + (td.text,)
        if len(player) > 1:
            players.append(player)

    first = not first

##SQL connectivity and data entry

db = MySQLdb.connect(host="localhost", user="root", passwd="hassan28", db = "adbpro")

cur = db.cursor()

try:
    count = 1
    for i in players:
        position, name, club, points, price = i
        s = price[1:]
        name = name.replace("'"," ")
        rating = 4
        availability = 1
        sql = "INSERT INTO players (pid,position,club,points,price,availability,rating,name) VALUES "
        try:
            sql += "('{0}', '{1}', '{2}', '{3}', '{4}','{5}','{6}','{7}')".format(count,position,club, points,s,availability, rating,name)
            cur.execute(sql)
            count +=1
        except UnicodeError:
            traceback.print_exc()
            print "excepted", position, name, club, points, price
            continue
        #print sql

    db.commit()

except:
    print sql
    traceback.print_exc()
    db.rollback()
    cur.execute("SELECT * FROM PLAYERS")


print "done"

Any help will be greatly appreciated.

Community
  • 1
  • 1
beginner
  • 99
  • 1
  • 14

2 Answers2

0

Seems like a duplicate of this question. Just for others, the solution is "When you connect() to your database, pass the charset='utf8' parameter."

Community
  • 1
  • 1
Brandon
  • 668
  • 8
  • 22
0

This is not a database problem; you are trying to interpolate Unicode values into a byte string, triggering an implicit encoding.

Don't use string formatting here, use SQL parameters instead:

sql = "INSERT INTO players (pid,position,club,points,price,availability,rating,name) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)"
params = (count, position, club, points, s, availability, rating, name)
cur.execute(sql, params)

Here the %s tell MySQLdb where to expect SQL parameters, and you pass in the parameters as a separate list to cursor.execute().

Do remember to tell the database connection that you want to use UTF-8 for Unicode values:

db = MySQLdb.connect(host="localhost", user="root", passwd="hassan28", 
                     db="adbpro", charset='utf8')
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343