i'm trying to create a news app for a schoolproject where i get information off rss feeds of my local newspapers, in order to combine multiple newspapers into one.
i'm running into problems when i try to insert my collected data into my Mysql database.
When i simply print my date (example: print urlnzz.entries[0].description) there is no problem with the german characters such as ü ä ö é à.
when i try to insert the data into the Mysql databse however, I get "UnicodeEncodeError: 'ascii' codec can't encode character.."
. Weird is, that this only happens for .title and .description, not for .category (even though there are also ü etc in there)
i've been looking for an answer for quite some time now, i changed the encoding of the variables with
t = urlbernerz.entries[i].title
print t.encode('utf-8')
changed the charset to utf-8 when i connect to the database and even tried the "try / except " function of python, yet nothing seems to work.
I've checked the type of each entry with type(u['entries'].title) and they are all unicode, now i need to encode them in a way that i can put them into my mysqldatabase
on the rss websites it states that it's already encoded as utf-8, and even though i explicitly tell python to encode it as utf-8 as well, it still gives me the error:'ascii' codec can't encode character u'\xf6'
i've tried many answer to this subject already, such as using str() or using chardet but nothing seem to work. Here's my code
import MySQLdb
import feedparser
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
db = MySQLdb.connect(host="127.0.0.1",
user="root",
passwd="",
db="FeedStuff",
charset='UTF8')
db.charset="utf8"
cur = db.cursor()
urllistnzz =['international', 'wirtschaft', 'sport']
urllistbernerz =['kultur', 'wissen', 'leben']
for u in range (len(urllistbernerz)):
urlbernerz = feedparser.parse('http://www.bernerzeitung.ch/'+urllistbernerz[u]+'/rss.html')
k = len(urlbernerz['entries'])
for i in range (k):
cur.execute("INSERT INTO articles (title, description, date, category, link, source) VALUES (' "+ str(urlbernerz.entries[i].title)+" ', ' " + str(urlbernerz.entries[i].description)+ " ', ' " + urlbernerz.entries[i].published + " ', ' " + urlbernerz.entries[i].category + " ', ' " + urlbernerz.entries[i].link + " ',' Berner Zeitung')")
for a in range (len(urllistnzz)):
urlnzz = feedparser.parse('http://www.nzz.ch/'+urllistnzz[a]+'.rss')
k = len(urlnzz['entries'])
for i in range (k):
cur.execute("INSERT INTO articles (title, description, date, category, link, source) VALUES (' "+str(urlnzz.entries[i].title)+" ', ' " + str(urlnzz.entries[i].description)+ " ', ' " + urlnzz.entries[i].published + " ', ' " + urlnzz.entries[i].category + " ', ' " + urlnzz.entries[i].link + " ', 'NZZ')")
db.commit()
cur.close()
db.close()