Exporting data as an XML file in google appengine

Question

I'm trying to export data to an XML file in the Google appengine, I'm using Python/Django. The file is expected to contain upto 100K records converted to XML. Is there an equivalent in App Engine of:

f = file('blah', 'w+')
f.write('whatever')
f.close()

?

Thanks

Edit What I'm trying to achieve is exporting some information to an XML document so it can be exported to google places (don't know exactly how this will work, but I've been told that google will fecth this xml file from time to time).

When you say 'export', what are you trying to achieve? If you want to send it to the user, simply output it the same way as you would any other document, after setting the `content-type` correctly. If you want to force it to be 'downloaded', set the content-disposition header per this doc: http://www.ietf.org/rfc/rfc2183.txt — Nick Johnson, Feb 25 '11 at 02:59

score 1 · Answer 1 · answered Feb 24 '11 at 14:23

You could also generate XML with Django templates. There's no special reason that a template has to contain HMTL. I use this approach for generating the Atom feed for my blog. The template looks like this. I pass it the collection of posts that go into the feed, and each Post entity has a to_atom method that generate its Atom representation.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
      xml:lang="en"
      xml:base="http://www.example.org">
  <id>urn:uuid:4FC292A4-C69C-4126-A9E5-4C65B6566E05</id>
  <title>Adam Crossland's Blog</title>
  <subtitle>opinions and rants on software and...things</subtitle>
  <updated>{{ updated }}</updated>
  <author>
    <name>Adam Crossland</name>
    <email>adam@adamcrossland.net</email>
  </author>
  <link href="http://blog.adamcrossland.net/" />
  <link rel="self" href="http://blog.adamcrossland.net/home/feed" />
  {% for each_post in posts %}{{ each_post.to_atom|safe }}
  {% endfor %}
</feed>

Interesting, it doesn't solve my problem, but I will generate xml with the django templates — jordinl, Feb 28 '11 at 12:51

score 0 · Answer 2 · edited May 23 '17 at 12:11

0

Every datastore model class has an instance method to_xml() that will generate an XML representation of that datastore type.

Run your query to get the records you want
Set the content type of the response as appropriate - if you want to prompt the user to save the file locally, add a content-disposition header as well
generate whatever XML preamble you need to come before your record data
iterate through the query results, calling to_xml() on each and adding that output to your reponse
do whatever closing of the XML preamble you need to do.

edited May 23 '17 at 12:11

Community

1
1

answered Feb 24 '11 at 13:58

bgporter

35,114
8
59
65

This is interesting, but seems irrelevant - @jordinl didn't ask how to serialize models, he asked how to output XML. – Nick Johnson Feb 25 '11 at 02:59
Hmm. Guess I misread the question. My brain reads "The file is expected to contain upto 100K records converted to XML" as 'I want to serialize 100K records into XML' – bgporter Feb 25 '11 at 12:24

score 0 · Answer 3 · answered Feb 28 '11 at 14:10

What the author is talking about is probably Sitemaps.

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

And about what I think you need is to write the XML to request object like so:

doc.writexml(self.response.out)

In my case I do this based on mime types sent from the client:

_MIME_TYPES = {
    # xml mime type needs lower priority, that's needed for WebKit based browsers,
    # which add application/xml equally to text/html in accept header
    'xml':  ('application/xml;q=0.9', 'text/xml;q=0.9', 'application/x-xml;q=0.9',),
    'html': ('text/html',),
    'json': ('application/json',), 
}

mime = self.request.accept.best_match(reduce(lambda x, y: x + y, _MIME_TYPES.values()))
if mime:
    for shortmime, mimes in _MIME_TYPES.items():
        if mime in mimes:
            renderer = shortmime
            break
# call specific render function
renderer = 'render' + renderer
logging.info('Using %s for serving response' % renderer)
try:
    getattr(self.__class__, renderer)(self)
except AttributeError, e:
    logging.error("Missing renderer %s" % renderer)

So, are you querying the DB and creating the XML each time? Would that be reasonable with 100K records? — jordinl, Mar 01 '11 at 10:01
No jordinl, I'm using above only to choose the right render mechanism, based on user acceptance. How you will use above mechanism is up to you, I just showed you possibility. In your case I think you should create the sitemap only once in a while and keep it either in memory or in DB and refresh it once in a while. — soltysh, Mar 02 '11 at 08:40

Exporting data as an XML file in google appengine

3 Answers3