I want to build an RSS Feed Reader by myself. So I started up.
My Test Page, from where I get my feed is 'http://heise.de.feedsportal.com/c/35207/f/653902/index.rss'.
It is a German page , because of that I choose as decoding "iso-8859-1". So here is the code.
def main():
counter = 0
try:
page = 'http://heise.de.feedsportal.com/c/35207/f/653902/index.rss'
sourceCode = opener.open(page).read().decode('iso-8859-1')
except Exception as e:
print(str(e))
#print sourceCode
try:
titles = re.findall(r'<title>(.*?)</title>',sourceCode)
links = re.findall(r'<link>(.*?)</link>',sourceCode)
except Exception as e:
print(str(e))
rssFeeds = []
for link in links:
if "rss." in link:
rssFeeds.append(link)
for feed in rssFeeds:
if ('html' in feed) or ('htm' in feed):
try:
print("Besuche " + feed+ ":")
feedSource = opener.open(feed).read().decode("iso-8859-1","replace")
except Exception as e:
print(str(e))
content = re.findall(r'<p>(.*?)</p>', feedSource)
try:
tempTxt = open("feed" + str(counter)+".txt", "w")
for line in content:
tempTxt.write(tagFilter(line))
except Exception as e:
print(str(e))
finally:
tempTxt.close()
counter += 1
time.sleep(10)
- First of all I start by opening the website I mentioned before. And so far there seems not to be any problem with opening it.
- After decoding the website I search in it for all expression which are inside a Link Tags.
- Now I select those links which have "rss" in them. Which get stored in a new list.
- With the new list, I start opening the links and search there fore there content.
And now start the problems. I decode those sides, still german sides, and I get errors like:
- 'charmap' codec can't encode character '\x9f' in position 339: character maps to
- 'charmap' codec can't encode character '\x9c' in position 43: character maps to
- 'charmap' codec can't encode character '\x80' in position 131: character maps to
And I really have no Idea why it won't work. The data which is collected before the error appears gets written into an textfile.
Example for collected data:
Einloggen auf heise onlineTopthemen:Nachdem Google Anfang des Monats eine 64-Bit-Beta seines hauseigenen Browsers Chrome für Windows 7 und Windows 8 vorgestellt hatte, kümmert sich der Internetriese nun auch um OS X. Wie Tester melden, verbreitet Google über seine Canary-/Dev-Kanäle für Entwickler und Early Adopter nun automatisch 64-Bit-Builds, wenn der User über einen kompatiblen Rechner verfügt.
I hope someone can help me. Also other clues or information which will help me build my own rss feed reader are welcome.
Greetings Templum