How to parse the xml file in the google-blogger in python?

Question

I have an xml file.
It is the list of my files in Google-blogger, how can I parse it in python to get every article?please give me the right code,which can get exact result.

import feedparser
d = feedparser.parse('blog.xml')
for entry in d.entries:
    print entry.content[0]['value']

I get all my articles in google-blogger,but the format of the file is mess,can i get only woeds ,to delete the html labels in the output?

score 2 · Accepted Answer · edited May 23 '17 at 11:57

2

That is an Atom feed; use feedparser to parse that file into individual articles.

import feedparser
d = feedparser.parse('/path/to/your/xmlfile.xml')
for entry in d.entries:
    print entry.title

This prints:

模板: R
为此博客设置的发布类型。
此博客管理员的电子邮件列表。
此博客是否包含成人内容
是否允许使用备用的 JS 渲染
博客的 Google Analytics（分析）帐号
存档索引日期格式的编号
此博客的存档频率应该为多少
有权进行发布的作者的电子邮件列表。
是否在博客中显示评论反向链接
是否为每个帖子提供存档页
哪些人能发表评论
是否要求评论者完成 Captcha
用于接收新评论通知的电子邮件地址列表
为博客评论所提供的 Feed 类型
博客评论表位置
博客评论消息
是否启用评论审阅
新评论进行审阅的天数
用于接收需要审阅新评论的通知的电子邮件地址

etc.

You can see what items each entry defines by looking at the result of the .keys() method:

>>> d.entries[0].keys()
['updated', u'gd_image', 'updated_parsed', 'published_parsed', 'tags', 'title', 'links', 'summary', 'content', 'guidislink', 'title_detail', 'link', 'author', 'published', 'authors', 'author_detail', 'id']

If you want to convert your HTML content to text, there are a few options. Most are listed in: Extracting text from HTML file using Python

edited May 23 '17 at 11:57

Community

1
1

answered Mar 28 '13 at 07:50

Martijn Pieters

1,048,767
296
4,058
3,343

i get `AttributeError: object has no attribute 'title'` – showkey Mar 30 '13 at 00:44
@it_is_a_literature: It turns out there was a typo in the example. Do read the `feedparser` documentation though. – Martijn Pieters Apr 01 '13 at 11:12
how can i get the content? – showkey Apr 06 '13 at 04:42
You are looking for `entry.content`. – Martijn Pieters Apr 06 '13 at 21:37
`print entry.content[0]['value']` can make me get the content,but how can i get a more better ,easy-readable formatted content? – showkey Apr 07 '13 at 01:07
@it_is_a_literature: You could convert the HTML content to text: [Extracting text from HTML file using Python](http://stackoverflow.com/q/328356) – Martijn Pieters Apr 07 '13 at 21:11

How to parse the xml file in the google-blogger in python?

1 Answers1