Small Tables in Python?

Question

Let's say I don't have more than one or two dozen objects with different properties, such as the following:

UID, Name, Value, Color, Type, Location

I want to be able to call up all objects with Location = "Boston", or Type = "Primary". Classic database query type stuff.

Most table solutions (pytables, *sql) are really overkill for such a small set of data. Should I simply iterate over all the objects and create a separate dictionary for each data column (adding values to dictionaries as I add new objects)?

This would create dicts like this:

{'Boston' : [234, 654, 234], 'Chicago' : [324, 765, 342] } - where those 3 digit entries represent things like UID's.

As you can see, querying this would be a bit of a pain.

Any thoughts of an alternative?

Steven Kryskalla · Accepted Answer · 2009-09-24T14:30:10.323

For small relational problems I love using Python's builtin sets.

For the example of location = 'Boston' OR type = 'Primary', if you had this data:

users = {
   1: dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   2: dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   3: dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   #...
}

You can do the WHERE ... OR ... query like this:

set1 = set(u for u in users if users[u]['Location'] == 'Boston')
set2 = set(u for u in users if users[u]['Type'] == 'Primary')
result = set1.union(set2)

Or with just one expression:

result = set(u for u in users if users[u]['Location'] == 'Boston'
                              or users[u]['Type'] == 'Primary')

You can also use the functions in itertools to create fairly efficient queries of the data. For example if you want to do something similar to a GROUP BY city:

cities = ('Boston', 'New York', 'Chicago')
cities_users = dict(map(lambda city: (city, ifilter(lambda u: users[u]['Location'] == city, users)), cities))

You could also build indexes manually (build a dict mapping Location to User ID) to speed things up. If this becomes too slow or unwieldy then I would probably switch to sqlite, which is now included in the Python (2.5) standard library.

Thank you, I've never used the builtin sets before. This should at least make it more explicit what's happening in the code. — akoumjian, Sep 24 '09 at 14:26

score 6 · Answer 2 · answered Sep 24 '09 at 15:11

6

I do not think sqlite would be "overkill" -- it comes with standard Python since 2.5, so no need to install stuff, and it can make and handle databases in either memory or local disk files. Really, how could it be simpler...? If you want everything in-memory including the initial values, and want to use dicts to express those initial values, for example...:

import sqlite3

db = sqlite3.connect(':memory:')
db.execute('Create table Users (Name, Location, Type)')
db.executemany('Insert into Users values(:Name, :Location, :Type)', [
   dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   ])
db.commit()
db.row_factory = sqlite3.Row

and now your in-memory tiny "db" is ready to go. It's no harder to make a DB in a disk file and/or read the initial values from a text file, a CSV, and so forth, of course.

Querying is especially flexible, easy and sweet, e.g., you can mix string insertion and parameter substitution at will...:

def where(w, *a):
  c = db.cursor()
  c.execute('Select * From Users where %s' % w, *a)
  return c.fetchall()

print [r["Name"] for r in where('Type="Secondary"')]

emits [u'Mr. Foo', u'Mr. Quux'], just like the more elegant but equivalent

print [r["Name"] for r in where('Type=?', ["Secondary"])]

and your desired query's just:

print [r["Name"] for r in where('Location="Boston" or Type="Primary"')]

etc. Seriously -- what's not to like?

answered Sep 24 '09 at 15:11

Alex Martelli

854,459
170
1,222
1,395

The advantage seems to be even more flexibility in querying, and the option of easily moving a db from memory to and from files, exporting, etc. The disadvantage, as I see it, is that you've had to import an additional module (not a huge deal), and someone else reading the code needs to learn what all those object methods are. It's a fine solution, but not really the simplest imho. – akoumjian Sep 24 '09 at 15:29
1

Wanna bet that more people already know about the Python DB API than about sets, genexps, `.union`, etc?-) – Alex Martelli Sep 25 '09 at 01:09
1

I just see a lot more elegance in using one dictionary, one set, and a loop than in using a database object, a cursor object, and six methods. – akoumjian Sep 28 '09 at 01:31
I have a follow up question: If I wanted to store more complex objects, such as Timeseries (http://pytseries.sourceforge.net/), how would that work? It seems like I would need to adapt it to one of the sqlite data types. – akoumjian Oct 05 '09 at 18:32
timeseries (as subclasses of masked-arrays which in turn are subclasses of numpy arrays, &c) are indeed extremely rich and complex objects -- no sensible way to map one timeseries to just one sqlite type while keeping the ability to search and order on some of its characteristics (pickle lets you turn them into blobs and back, but that doesn't allow searching and ordering). If I needed to persist timeseries to/from a relational DB I'd use a dedicated table for them and another for their contents, with foreign keys. This is really an extremely different problem from the original question. – Alex Martelli Oct 05 '09 at 20:26

score 2 · Answer 3 · answered Sep 24 '09 at 14:31

If it's really a small amount of data, I'd not bother with an index and probably just write a helper function:

users = [
   dict(Name="Mr. Foo", Location="Boston", Type="Secondary"),
   dict(Name="Mr. Bar", Location="New York", Type="Primary"),
   dict(Name="Mr. Quux", Location="Chicago", Type="Secondary"),
   ]

def search(dictlist, **kwargs):
   def match(d):
      for k,v in kwargs.iteritems():
         try: 
            if d[k] != v: 
               return False
         except KeyError:
            return False
      return True

   return [d for d in dictlist if match(d)]

Which will allow nice looking queries like this:

result = search(users, Type="Secondary")

This is also very helpful. The aesthetic of this code keeps more with what people expect with Python. However, I like the little bit of flexibility that sets seem to offer with unions/intersections. — akoumjian, Sep 24 '09 at 14:49

Small Tables in Python?

3 Answers3