Counts of events grouped by date in python?

Question

This is no doubt another noobish question, but I'll ask it anyways:

I have a data set of events with exact datetime in UTC. I'd like to create a line chart showing total number of events by day (date) in the specified date range. Right now I can retrieve the total data set for the needed date range, but then I need to go through it and count up for each date.

The app is running on google app engine and is using python.

What is the best way to create a new data set showing date and corresponding counts (including if there were no events on that date) that I can then use to pass this info to a django template?

Data set for this example looks like this:

class Event(db.Model):
    event_name = db.StringProperty()
    doe = db.DateTimeProperty()
    dlu = db.DateTimeProperty()
    user = db.UserProperty()

Ideally, I want something with date and count for that date.

Thanks and please let me know if something else is needed to answer this question!

kostmo · Accepted Answer · 2010-03-28T22:21:42.740

1

You'll have to do the binning in-memory (i.e. after the datastore fetch).

The .date() method of a datetime instance will facilitate your binning; it chops off the time element. Then you can use a dictionary to hold the bins:

bins = {}
for event in Event.all().fetch(1000):
    bins.setdefault(event.doe.date(), []).append( event )

Then do what you wish with (e.g. count) the bins. For a direct count:

counts = collections.defaultdict(int)
for event in Event.all().fetch(1000):
    counts[event.doe.date()] += 1

edited Mar 28 '10 at 22:21

answered Mar 28 '10 at 22:06

kostmo

6,222
4
40
51

sorry, can you elaborate a little or maybe include a link to a tutorial about "binning"? How do I retrieve a count for say 01/01/2010 from "counts"? – Sologoub Mar 29 '10 at 02:45
Ummmm ... you are asking how to retrieve a value from a dictionary? Something like this: `key = datetime.date(2010, 1, 1); the_answer = counts[key]` ... by the way, @kostmo's answer is incomplete; as you want ultimately a table that includes zero counts, you need to iterate over your whole date range either before or after counting. – John Machin Mar 29 '10 at 04:37
@John Machin: Actually, in my second snippet (the direct count), the defaultdict will automatically return "0" for any key that hasn't been explicitly set. – kostmo Mar 29 '10 at 06:17
@Sologoub: "Binning" is what you are asking for--that is, placing all events that occur on the same day into a single bin for that day... aka "histogramming". – kostmo Mar 29 '10 at 06:24
@kostmo: Yes of course it will automatically return etc etc but you didn't tell him to iterate over his dictionary to build the table he wants. – John Machin Mar 29 '10 at 09:40
thanks everyone! I'll give this a try and be back if I mess something up. – Sologoub Mar 29 '10 at 16:06
Just a note for whoever reads this in the future - here's the tutorial on dictionaries: http://docs.python.org/tutorial/datastructures.html – Sologoub Mar 31 '10 at 00:52

score 0 · Answer 2 · edited May 23 '17 at 10:27

0

I can't see how that would be possible with single query as GQL has no support for GROUP BY or aggregation generally.

edited May 23 '17 at 10:27

Community

1
1

answered Mar 28 '10 at 20:40

jholster

5,066
1
27
20

This can't be done in GQL, but I'm hoping someone can help me do this in memory – Sologoub Mar 29 '10 at 02:43

score 0 · Answer 3 · answered Mar 29 '10 at 09:32

0

In order to minimize the amount of work you do, you'll probably want to write a task that sums up the per-day totals once, so you can reuse them. I'd suggest using the bulkupdate library to run a once-a-day task that counts events for the previous day, and creates a new model instance, with a key name based on the date, containing the count. Then, you can get all needed data points by doing a query (or better, a batch get) for the set of summary entities you need.

answered Mar 29 '10 at 09:32

Nick Johnson

100,655
16
128
198

Thanks Nick. One problem is that I'm putting together a basic reporting engine with a number of filters. I did think of building aggregation tables (sort of data warehouse?) but that's going to be at a later stage. Also, google app engine offers decent cronjob interface, so running nightly aggregations should not be hard. – Sologoub Mar 29 '10 at 16:08
Bear in mind that cronjobs alone won't cut it for nightly aggregations if the number of records to aggregate becomes too high - hence my suggestion of the bulkupdate library. – Nick Johnson Mar 29 '10 at 16:26

Counts of events grouped by date in python?

3 Answers3