3

I'm using Google Appengine with Python 2.5 and I have a function that is causing a bottleneck. I pass it a list of 200 Model instances retrieved from the datastore, and it returns it in a json format which I then pass to the client.

I originally used += to concatenate all the values together, but it was taking around 30 seconds for the server to respond with the JSON. I ran some checks and the code before this function runs in under a second. It is the last statement before the server responds with the JSON and the time it takes to reach the client averages at 1 second (on my local network). This function takes on average 30 seconds to execute.

I read this article and tried using the cStringIO method (I also used the list join method but it took the same amount of time and cStringIO uses less memory so I stuck with it). However, this took around the same time as += concatenation (sometimes longer). Can anyone see any issues I have with my code that might make it slower?

EDIT: Boss says it has to be done this way. No json libraries (take it up with him).

EDIT 2: LastName Model:

class LastName(db.Model): 
    entry = db.ReferenceProperty(AlumniEntry, collection_name='last_names') 
    last_name = db.StringProperty(indexed=False) 
    last_name_search = db.StringProperty()

AlumniEntry is the Model that is queried. I pass the list that I get back from the ds to get_json_from_alumnus() (alumnus parameter).

def get_json_from_alumnus(alumnus, search, total=0):
    if len(alumnus) > 0:
        from cStringIO import StringIO
        concat_file = StringIO()

        concat_file.write('{ "alumnus": [')
        i = 0
        for alumni in alumnus:
            if alumni.author:
                author = alumni.author.nickname()
            else:
                author = 'Anonymous'

            concat_file.write('{ ')
            concat_file.write('"author": "')
            concat_file.write(author)
            concat_file.write('", ')
            concat_file.write('"title": "')
            concat_file.write(alumni.title)
            concat_file.write('", ')
            concat_file.write('"first_name": "')
            concat_file.write(alumni.first_name)
            concat_file.write('", ')

            concat_file.write(' "last_names": [')
            j = 0
            for lname in alumni.last_names:
                concat_file.write('{ "last_name": "')
                concat_file.write('lname.last_name')
                concat_file.write('" }')
                if not j == alumni.last_names.count() - 1:
                    #last_names += ','
                    concat_file.write(',')
                j +=1
            concat_file.write('], ')

            concat_file.write(' "addresses": [')
            j = 0
            for address in alumni.addresses:
                if address.street == '' and address.city == '' and address.state == '' and address.zip_code == '':
                    break

                concat_file.write('{ "address":{ "street" : "')
                concat_file.write(address.street)
                concat_file.write('", ')
                concat_file.write('"city" : "')
                concat_file.write(address.city)
                concat_file.write('", ')
                concat_file.write('"state" : "')
                concat_file.write(address.state)
                concat_file.write('", ')
                concat_file.write('"zip_code" : "')
                concat_file.write(address.zip_code)
                concat_file.write('" } }')

                if not j == alumni.addresses.count() - 1:
                    concat_file.write(',')
                j += 1
            concat_file.write('], ')

            concat_file.write(' "numbers": [')
            j = 0
            for phone_number in alumni.phone_numbers:
                concat_file.write('{ "phone_number": "')
                concat_file.write(phone_number.number)
                concat_file.write('" }')
                if not j == alumni.phone_numbers.count() - 1:
                    concat_file.write(',')
                j += 1
            concat_file.write('], ')

            concat_file.write(' "emails": [')
            j = 0
            for email in alumni.emails:
                concat_file.write('{ "email": "')
                concat_file.write(email.email)
                concat_file.write('" }')
                if not j == alumni.emails.count() - 1:
                    concat_file.write(',')
                j += 1
            concat_file.write('], ')

            concat_file.write('"grad_year": "')
            concat_file.write(alumni.grad_year)
            concat_file.write('", ')
            concat_file.write('"elementary": "')
            concat_file.write(alumni.elementary)
            concat_file.write('", ')
            concat_file.write('"entered": "')
            concat_file.write(str(alumni.entered.strftime('%B %d %Y')))
            concat_file.write('", ')
            concat_file.write('"key": "')
            concat_file.write(str(alumni.key()))
            concat_file.write('" ')
            concat_file.write('}')

            if not  i == len(alumnus) - 1:
                concat_file.write(',')
            i += 1
        concat_file.write('], "total" : "')
        concat_file.write(str(total))
        concat_file.write('" }')
    else:
        concat_file.write('{ "alumnus": "No Alumni Entered Yet!" }' if not search else '{ "alumnus": "No Matches!" }')

    return concat_file.getvalue()
Eliezer
  • 7,209
  • 12
  • 56
  • 103
  • 3
    I really, really, really, really have to know: why aren't you using simplejson? – Ignacio Vazquez-Abrams Feb 21 '12 at 22:41
  • Boss says no simplejson. I don't make the rules but I do follow them. In any case I have the same issue with a different function where I need to concatenate a CSV file together, but it wasn't as pressing as this one. – Eliezer Feb 21 '12 at 22:47
  • 1
    *sigh* simplejson will catch all the issues that your code will screw up. – Ignacio Vazquez-Abrams Feb 21 '12 at 22:48
  • I won't comment on that in case he ever looks at this ;). I asked him why we can't use it and he said that he has his reasons. I took that to mean stfu – Eliezer Feb 21 '12 at 22:50
  • I mean, GAE even *gives* you simplejson on a silver platter. I cannot fathom the reasoning that would go into saying "no simplejson" for as trivial a task as this. – Ignacio Vazquez-Abrams Feb 21 '12 at 22:52
  • Honestly I've been trying to figure out any possible reason he could have for doing it this way. I got nothing. In any event is this the best performance I could hope for? – Eliezer Feb 21 '12 at 22:54

3 Answers3

7

I suspect this line in your code:

if not j == alumni.last_names.count() - 1:

(and a few similar lines).

You didn't post your model but to me that looks like alumni.last_names might be a query? Running a query for each entity would be a super bad idea, and might very well dominate your cost. It should not take anywhere near 30 seconds to concatenate a few thousand strings using cStringIO.

It's easy to find out if you are doing too many queries using Appstats: http://code.google.com/appengine/docs/python/tools/appstats.html (you can even try this in the dev appserver).

PS. The singular is actually alumnus and the plural is alumni. :-)

Guido van Rossum
  • 16,690
  • 3
  • 46
  • 49
  • See edit 2 for my LastName model. I was actually just thinking that might be the problem. I never thought about it before but I guess that is another query (or get). This was a design flaw that I thought was fixed but apparently it wasn't. I'm gonna try again with a different model structure. Thanks a ton! – Eliezer Feb 22 '12 at 04:11
  • As for the alumni alumnus thing...half of our code has it one way and half has it the other way. Not sure exactly how that happened but it's a complete disaster :) – Eliezer Feb 22 '12 at 04:12
5

str.join() and string interpolation usually give much better performance than repeated concatenation. Give those a try, and may the powers that be have mercy on your soul.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Took the same amount of time. I didn't think that this was so much data but is it possible that this is python's ceiling for processing this amount of strings? I'm running on an insanely powerful and optimized machine so it shouldn't be my hardware (it's actually running faster on the appengine server than on my devserver). I tried the simplejson method for myself and it was pretty darn fast. If you put that in an answer I'll accept it unless someone has a concatenation technique that's satisfactory. – Eliezer Feb 21 '12 at 23:35
2

I'd suggest to create the data structure that you want to send as an answer in python itself and use a json module to generate the string version. The reason for this is that the most popular json modules are at least partly implemented in c so, despite cStringIO is also implemented in c, I guess they do some optimizations difficult to achieve using just the standard library.

For more information, please refer to this related question.

Edit: If using an third party json module is out of the question, then I'd try to reduce the number of write calls by using a formatting string as long as possible.

I guess using a templating library that could speed that up would be also ruled out, so the only way that I can think of would be to cache as much as possible, so that subsequent calls don't need to redo the whole task.

Community
  • 1
  • 1
jcollado
  • 39,419
  • 8
  • 102
  • 133