3

This is no doubt another noobish question, but I'll ask it anyways:

I have a data set of events with exact datetime in UTC. I'd like to create a line chart showing total number of events by day (date) in the specified date range. Right now I can retrieve the total data set for the needed date range, but then I need to go through it and count up for each date.

The app is running on google app engine and is using python.

What is the best way to create a new data set showing date and corresponding counts (including if there were no events on that date) that I can then use to pass this info to a django template?

Data set for this example looks like this:

class Event(db.Model):
    event_name = db.StringProperty()
    doe = db.DateTimeProperty()
    dlu = db.DateTimeProperty()
    user = db.UserProperty()

Ideally, I want something with date and count for that date.

Thanks and please let me know if something else is needed to answer this question!

Sologoub
  • 5,312
  • 6
  • 37
  • 65

3 Answers3

1

You'll have to do the binning in-memory (i.e. after the datastore fetch).

The .date() method of a datetime instance will facilitate your binning; it chops off the time element. Then you can use a dictionary to hold the bins:

bins = {}
for event in Event.all().fetch(1000):
    bins.setdefault(event.doe.date(), []).append( event )

Then do what you wish with (e.g. count) the bins. For a direct count:

counts = collections.defaultdict(int)
for event in Event.all().fetch(1000):
    counts[event.doe.date()] += 1
kostmo
  • 6,222
  • 4
  • 40
  • 51
  • sorry, can you elaborate a little or maybe include a link to a tutorial about "binning"? How do I retrieve a count for say 01/01/2010 from "counts"? – Sologoub Mar 29 '10 at 02:45
  • Ummmm ... you are asking how to retrieve a value from a dictionary? Something like this: `key = datetime.date(2010, 1, 1); the_answer = counts[key]` ... by the way, @kostmo's answer is incomplete; as you want ultimately a table that includes zero counts, you need to iterate over your whole date range either before or after counting. – John Machin Mar 29 '10 at 04:37
  • @John Machin: Actually, in my second snippet (the direct count), the defaultdict will automatically return "0" for any key that hasn't been explicitly set. – kostmo Mar 29 '10 at 06:17
  • @Sologoub: "Binning" is what you are asking for--that is, placing all events that occur on the same day into a single bin for that day... aka "histogramming". – kostmo Mar 29 '10 at 06:24
  • @kostmo: Yes of course it will automatically return etc etc but you didn't tell him to iterate over his dictionary to build the table he wants. – John Machin Mar 29 '10 at 09:40
  • thanks everyone! I'll give this a try and be back if I mess something up. – Sologoub Mar 29 '10 at 16:06
  • Just a note for whoever reads this in the future - here's the tutorial on dictionaries: http://docs.python.org/tutorial/datastructures.html – Sologoub Mar 31 '10 at 00:52
0

I can't see how that would be possible with single query as GQL has no support for GROUP BY or aggregation generally.

Community
  • 1
  • 1
jholster
  • 5,066
  • 1
  • 27
  • 20
0

In order to minimize the amount of work you do, you'll probably want to write a task that sums up the per-day totals once, so you can reuse them. I'd suggest using the bulkupdate library to run a once-a-day task that counts events for the previous day, and creates a new model instance, with a key name based on the date, containing the count. Then, you can get all needed data points by doing a query (or better, a batch get) for the set of summary entities you need.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • Thanks Nick. One problem is that I'm putting together a basic reporting engine with a number of filters. I did think of building aggregation tables (sort of data warehouse?) but that's going to be at a later stage. Also, google app engine offers decent cronjob interface, so running nightly aggregations should not be hard. – Sologoub Mar 29 '10 at 16:08
  • Bear in mind that cronjobs alone won't cut it for nightly aggregations if the number of records to aggregate becomes too high - hence my suggestion of the bulkupdate library. – Nick Johnson Mar 29 '10 at 16:26