1

My web app uses Django on Heroku.

The app has two components: a banner serving API and a reporting API.

Banners served by the app contain an ajax call to the app's reporting API. This call is made once per impression. The purpose of this is to record how many impressions each banner gets each day.

Reporting API abridged source:

def report_v1(request):
    '''
    /report/v1/?cid=

    - cid= the ID of the banner
    '''

    creative = get_creative(creative_id = request.GET.get('cid'))
    if not creative:
        return HttpResponseNotFound("404 creative with that ID not found")

    day = timezone.now()

    fact, created = CreativeFact.objects.get_or_create(
        day = day,
        app = app,
        creative = creative,
        defaults = {
                'impression_count': 0,
            }
        )

    fact.impression_count += 1

    response = HttpResponse("200 OK")
    response["Access-Control-Allow-Origin"] = "*"

    return response

My problem: Is there any way to avoid writing to the database on every single impression? I'm currently tracking hundreds of thousands of impressions this way, and expect to track millions soon, and I think this is inefficient.

bfox
  • 274
  • 2
  • 13
  • 1
    I don't really know Django, but in most platforms I'd suggest that you write CSV-like data to a tempfile. Then periodically rotate the tempfile with an empty one and `COPY ... FROM stdin` the old tempfile into the database to loads its contents, possibly into a temp table you then join on to update/insert the real target table. Since you're using "get_or_create" I'm guessing you'd want to load into a tempfile, then do a bulk upsert; see http://stackoverflow.com/q/17267417/398670 – Craig Ringer Apr 20 '15 at 03:27

1 Answers1

1

You could cache the counts and then write them to the model in increments of 100 or whatever the optimal value is:

At the top of report_v1:

if getattr(report_v1, 'cache', None) is None:
    report_v1.cache = {}

Then, after you verify that the cid is valid (after the line with the 404 response):

if cid in report_v1.cache:
    report_v1.cache[cid] += 1
else:
    report_v1.cache[cid] = 1

Then, you'd want to increment the value on the model only at certain intervals:

if not report_v1.cache[cid] % 100:
    #increment impression_count 
    #wipe cache

The downside to this approach is that the cache is in memory, so it's lost if the app crashes.

bfox
  • 274
  • 2
  • 13
1.618
  • 1,765
  • 16
  • 26
  • This seems really helpful, thanks! Can you please point me in the direction of the documentation for the cache attribute? (I've never implemented caching before.) – bfox Apr 20 '15 at 05:00
  • 1
    It's just a generic attribute, and 'cache' is just an arbitrary name, so there's no specific documentation for it. The important thing to realize though is that `cache` is an attribute of the function -- it's _not_ a local variable. – 1.618 Apr 20 '15 at 05:03