3
_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1


def getSource(theurl, moved = 0):
    if moved == 1:
        theurl = urllib2.urlopen(theurl).geturl()
    urlReq = urllib2.Request(theurl)
    urlReq.add_header('User-Agent',random.choice(agents))
    urlResponse = urllib2.urlopen(urlReq)
    htmlSource = urlResponse.read()
    return htmlSource

new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
new_u.save()

Why is this happening? I am basically downloading URL of a page...and then saving it to a database using Django.

It only happens sometimes....and sometimes it works fine.

Edit: it seems like I have to set the database to UTF-8? What is the command to do that?

TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
  • The code where you execute the query please... – Bobby Nov 27 '09 at 12:25
  • Bobby, the query is new_u, new_u.save() . It's Django query – TIMEX Nov 27 '09 at 12:25
  • @alex: Ohhh...never worked with that system. May best guess is, that you don't escape the html-String, and he's trying to insert 'faulty' values. For PHP the functionality is called mysql_real_escape_string. – Bobby Nov 27 '09 at 12:27

2 Answers2

3

You basically need to ensure proper a string encoding. E.g. the string you provide to django is not UTF-8 encoded and therefore some characters can't be resolved.

Some helpful advice on how to find the encoding of the requested page can be found here: urllib2 read to Unicode

Community
  • 1
  • 1
miku
  • 181,842
  • 47
  • 306
  • 310
0

There are 2 ways to go if you want to alter the character set in MySQL. First is the default of the database, see MySQL Alter database, and the second is per-table: MySQL Alter Table.

The database gives the default charset for, I believe, new tables. This can be overridden on a per-table basis, which you need to do since you already have tables. "utf8" is a supported character set.

Also have a look at Blog about UTF8 with django and MySQL.

extraneon
  • 23,575
  • 2
  • 47
  • 51