46

I noticed that SQLAlchemy was slow fetching (and ORMing) some data, which was rather fast to fetch using bare bone SQL. First off, I created a database with a million records:

mysql> use foo
mysql> describe Foo;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id    | int(11) | NO   | PRI | NULL    |       |
| A     | int(11) | NO   |     | NULL    |       |
| B     | int(11) | NO   |     | NULL    |       |
| C     | int(11) | NO   |     | NULL    |       |
+-------+---------+------+-----+---------+-------+
mysql> SELECT COUNT(*) FROM Foo;
+----------+
| COUNT(*) |
+----------+
|  1000000 |
+----------+
mysql> 

As a crude test, querying all Foo's takes approximately 2 seconds:

herbert@dev0 ~ $ date; echo 'use foo; select * from Foo;' | mysql -uroot -pxxx > /dev/null; date
zo apr 20 18:48:49 CEST 2014
zo apr 20 18:48:51 CEST 2014

If I do this in python using MySQLdb this takes a approximately 3 seconds, including the construction of Foo objects:

herbert@dev0 ~ $ python BareORM.py 
query execution time:  0:00:02.198986
total time:  0:00:03.403084

Which is the output of:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb
import sys
import time
import datetime

class Foo:
    def __init__(self, a, b, c):
        self.a=a; self.b=b; self.c=c;

try:
    start = datetime.datetime.now()
    con = MySQLdb.connect('localhost', 'root', 'xxx', 'foo')
    cur = con.cursor();

    cur.execute("""SELECT * FROM Foo LIMIT 1000000""")
    print "query execution time: ", datetime.datetime.now()-start
    foos = [];
    for elem in cur:
        foos.append(Foo(elem[1], elem[2], elem[3]))
    con.commit()

except MySQLdb.Error, e:
    print "Error %d: %s" % (e.args[0], e.args[1])
    sys.exit(1)

finally:
    if con: con.close()
    print "total time: ",  datetime.datetime.now()-start

However, using SQLAlchemy to reduce boilerplate code, it needed approximately 25 seconds to do the same job:

herbert@dev0 ~ $ python AlchemyORM.py 
total time:  0:00:24.649279

Using this code:

import sqlalchemy
import datetime
import MySQLdb

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship, backref

Base = declarative_base()

class Foo(Base):
    __tablename__ = 'Foo'
    id = Column(Integer, primary_key=True)
    A  = Column(Integer(unsigned=False), nullable=False)
    B  = Column(Integer(unsigned=False), nullable=False)
    C  = Column(Integer(unsigned=False), nullable=False)

engine  = create_engine('mysql+mysqldb://root:xxx@localhost/foo')
Session = sessionmaker(bind=engine)
session = Session()
start = datetime.datetime.now()
foos  = session.query(Foo).limit(1000000).all()
print "total time: ", datetime.datetime.now()-start

Why does SQLAlchemy operate ~10x slower than the bare SQL solution, assuming that SQLAlchemy should do approximately the same thing? Can I speed things up somehow?

This is a minimal working example of a more complicated query, which joins several tables using eager loading. I was considering just doing simple queries on a single table, and then using dictionaries to create id->object maps and collate one-to-N relations. But before doing so, I want to be sure that SQLAlchemy is unable to perform better, because writing your own ORM is a bad idea from a software design point of view. Imho, a 2x slowdown would be acceptable (maybe).

If you know about other (faster) python-SQL ORM's, or maybe BigTable-alike solutions (that already are the ORM), feel free to mention them as a comment.

EDIT: Also tried this with Peewee, which resulted in ~15 s.

from peewee import *
import datetime;
database = MySQLDatabase("foo", host="localhost", port=3306, user="root", passwd="xxx")

class Foo(Model):
        id = IntegerField()
        A  = IntegerField()
        B  = IntegerField()
        C  = IntegerField()

        class Meta:
                db_table = 'Foo'
                database = database

start = datetime.datetime.now()
foos = Foo.select()
cnt=0;
for i in foos: cnt=cnt+1
print "total time: ", datetime.datetime.now() - start

EDIT: As a response to Matthias I tried to do the same thing in Java with Hibernate, the result is approximately 8 to 10 seconds, not exactly fast, but a lot faster than 25 seconds. The code, starting with some classes and ending with some configuration:

package herbert.hibernateorm;

import java.util.List;

import org.hibernate.Session; 
import org.hibernate.Transaction;
import org.hibernate.SessionFactory;
import org.hibernate.cfg.Configuration;

public class App {
   public static void main(String[] args) throws Exception {
      SessionFactory factory = new Configuration().configure().buildSessionFactory();
      Session session = factory.openSession();
      Transaction tx = session.beginTransaction();
      long start = System.currentTimeMillis();
      List foos = session.createQuery("FROM Foo").list(); 
      System.out.println(foos.size());
      System.out.printf("total time: %d\n", System.currentTimeMillis() - start);
      session.close();
   }
}
package herbert.hibernateorm;

public class Foo {
    private int id, a, b, c;
    public Foo() {}
    public Foo(int A, int B, int C) { this.a=A; this.b=B; this.c=C; }

    public int getId() { return id; }
    public void setId(int id) { this.id = id; }
    public int getA() { return a; }
    public void setA(int a) { this.a = a; }
    public int getB() { return b; }
    public void setB(int b) { this.b = b; }
    public int getC() { return c; }
    public void setC(int c) { this.c = c; }
}

The configuration (hibernate.cfg.xml and hibernate.hbm.xml respectively)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC "-//Hibernate/Hibernate Configuration DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
  <session-factory>
    <property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property>
    <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
    <property name="hibernate.connection.url">jdbc:mysql://localhost:3306/foo?zeroDateTimeBehavior=convertToNull</property>
    <property name="hibernate.connection.username">root</property>
    <property name="hibernate.connection.password">xxx</property>
    <mapping resource="hibernate.hbm.xml"/>
  </session-factory>
</hibernate-configuration>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN" "http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
    <class name="herbert.hibernateorm.Foo" table="Foo" catalog="foo">
        <id name="id" type="int">
            <column name="id" />
            <generator class="assigned" />
        </id>
        <property name="a" type="int">
            <column name="A" not-null="true" />
        </property>
        <property name="b" type="int">
            <column name="B" not-null="true" />
        </property>
        <property name="c" type="int">
            <column name="C" not-null="true" />
        </property>
    </class>
</hibernate-mapping>

And finally the pom file to run it all in maven:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>herbert</groupId>
    <artifactId>hibernateORM</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>hibernateORM</name>
    <url>http://maven.apache.org</url>
    <repositories>
        <repository>
            <id>unknown-jars-temp-repo</id>
            <name>A temporary repository created by NetBeans for libraries and jars it could not identify. Please replace the dependencies in this repository with correct ones and delete this repository.</name>
            <url>file:${project.basedir}/lib</url>
        </repository>
    </repositories>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.21</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-core</artifactId>
            <version>4.0.1.Final</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate</groupId>
            <artifactId>hibernate-entitymanager</artifactId>
            <version>4.0.1.Final</version>
        </dependency>
        <dependency>
            <groupId>org.hibernate.common</groupId>
            <artifactId>hibernate-commons-annotations</artifactId>
            <version>4.0.1.Final</version>
        </dependency>   
        <dependency>
            <groupId>nz.ac.waikato.cms.weka</groupId>
            <artifactId>weka-dev</artifactId>
            <version>3.7.10</version>
        </dependency>
        <dependency>
            <groupId>commons-configuration</groupId>
            <artifactId>commons-configuration</artifactId>
            <version>1.9</version>
        </dependency>
        <dependency>
            <groupId>commons-net</groupId>
            <artifactId>commons-net</artifactId>
            <version>3.1</version>
            <classifier>examples</classifier>
        </dependency>
        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.2.2</version>
        </dependency>
        <dependency>
            <groupId>maven</groupId>
            <artifactId>maven-jetty-plugin</artifactId>
            <version>1.1</version>
            <type>plugin</type>
        </dependency>
        <dependency>
            <groupId>commons-io</groupId>
            <artifactId>commons-io</artifactId>
            <version>2.4</version>
        </dependency>
        <dependency>
                <groupId>com.kenai.nbpwr</groupId>
                <artifactId>org-slf4j-jdk14</artifactId>
                <version>1.6.1-201106101300</version>
                <type>nbm</type>
        </dependency>

    </dependencies>
</project>
zzzeek
  • 72,307
  • 23
  • 193
  • 185
Herbert
  • 5,279
  • 5
  • 44
  • 69
  • As a suggestion I found here ( http://pythonguy.wordpress.com/2011/08/17/sqlalchemy-tips-performance/#comment-1284 ) I tried pagination, **foos = session.query(Foo).yield_per(X).all()** for X=1000,10 000, 100 000, unfortunately these also yields execution times of approximately 26 seconds. I also tried **foos = session.query(Foo).execution_options(stream_results=True).all()**, but noted that I don't use psycopg2, as I am using MySQL and not PostgreSQL. This took ~25 seconds. Assuming 2014 common sense, MySQL should also perform well on 1M record results. – Herbert Apr 20 '14 at 19:43
  • Take a look at [Why is SQLAlchemy insert with sqlite 25 times slower than using sqlite3 directly?](http://stackoverflow.com/a/11769768/99594) – van Apr 22 '14 at 18:42
  • I already read about this here http://docs.sqlalchemy.org/en/rel_0_8/faq.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow . However, I only want to read data, not update it. Future use cases would only require inserts, and in very rare cases maybe updates, but in that case a delete-and-insert would be OK. – Herbert Apr 28 '14 at 09:19
  • You already mentioned chunks/`yield_per()` which often gives major improvement on queries: http://stackoverflow.com/questions/7389759/memory-efficient-built-in-sqlalchemy-iterator-generator – smci Jul 19 '14 at 02:18
  • @herbert: I'm getting about 14 seconds to load 1M full object rows w/ yield_per of 1000, faster than peewee. – zzzeek Aug 29 '14 at 14:59
  • In my experience, python + sqlalchemy fails to deliver in terms of performance if you want to maintain expressivity (i.e. if you want to avoid using SQLAlchemy as a thin layer to write SQL code). I haven't yet found a python solution that delivers good performance when it comes to fetching thousands of rows from the db and doing something with it. – Ezequiel Jun 26 '17 at 14:36
  • @Ezequiel Way back I wrote a simple wrapper that did a select on one table, and mapped that to a python generator of Objects. Then some functions that would join these generators fast by creating in-memory python dicts, materializing the generators. At the birth of SQL this was probably too memory intensive, but for me this resulted in few seconds queries that took 30 seconds with SQLAlch. I think a library that joins using python instead of sql's JOIN can really help performance. I couldn't find a good reason to use SQL-JOIN's for big data extraction other than that it's what everyone does. – Herbert Jun 27 '17 at 11:50
  • 1
    Reading through, it sounds like you reinvented *hash join* in Python (that whole "creating in-memory python dicts") to overcome the fact that MySQL only supports nested loop joins (at least used to). Other DBMS have supported hash joins and other strategies for a long time. – Ilja Everilä Jun 09 '19 at 11:56

3 Answers3

74

Here is the SQLAlchemy version of your MySQL script that performs in four seconds, compared to three for MySQLdb:

from sqlalchemy import Integer, Column, create_engine, MetaData, Table
import datetime

metadata = MetaData()

foo = Table(
    'foo', metadata,
    Column('id', Integer, primary_key=True),
    Column('a', Integer(), nullable=False),
    Column('b', Integer(), nullable=False),
    Column('c', Integer(), nullable=False),
)


class Foo(object):
    def __init__(self, a, b, c):
        self.a = a
        self.b = b
        self.c = c

engine = create_engine('mysql+mysqldb://scott:tiger@localhost/test', echo=True)
start = datetime.datetime.now()

with engine.connect() as conn:
    foos = [
        Foo(row['a'], row['b'], row['c'])
        for row in
        conn.execute(foo.select().limit(1000000)).fetchall()
    ]


print "total time: ", datetime.datetime.now() - start

runtime:

total time:  0:00:04.706010

Here is a script that uses the ORM to load object rows fully; by avoiding the creation of a fixed list with all 1M objects at once using yield per, this runs in 13 seconds with SQLAlchemy master (18 seconds with rel 0.9):

import time
from sqlalchemy import Integer, Column, create_engine, Table
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class Foo(Base):
    __table__ = Table(
        'foo', Base.metadata,
        Column('id', Integer, primary_key=True),
        Column('a', Integer(), nullable=False),
        Column('b', Integer(), nullable=False),
        Column('c', Integer(), nullable=False),
    )


engine = create_engine('mysql+mysqldb://scott:tiger@localhost/test', echo=True)

sess = Session(engine)

now = time.time()

# avoid using all() so that we don't have the overhead of building
# a large list of full objects in memory
for obj in sess.query(Foo).yield_per(100).limit(1000000):
    pass

print("Total time: %d" % (time.time() - now))

We can then split the difference between these two approaches, and load just individual columns with the ORM:

for obj in sess.query(Foo.id, Foo.a, Foo.b, Foo.c).yield_per(100).limit(1000000):
    pass

The above again runs in 4 seconds.

The comparison of SQLAlchemy Core is the more apt comparison to a raw MySQLdb cursor. If you use the ORM but query for individual columns, it's about four seconds in most recent versions.

At the ORM level, the speed issues are because creating objects in Python is slow, and the SQLAlchemy ORM applies a large amount of bookkeeping to these objects as it fetches them, which is necessary in order for it to fulfill its usage contract, including unit of work, identity map, eager loading, collections, etc.

To speed up the query dramatically, fetch individual columns instead of full objects. See the techniques at http://docs.sqlalchemy.org/en/latest/faq/performance.html#result-fetching-slowness-orm which describe this.

For your comparison to PeeWee, PW is a much simpler system with a lot less features, including that it doesn't do anything with identity maps. Even with PeeWee, about as simple of an ORM as is feasible, it still takes 15 seconds, which is evidence that cPython is really really slow compared to the raw MySQLdb fetch which is in straight C.

For comparison to Java, the Java VM is way way way faster than cPython. Hibernate is ridiculously complicated, yet the Java VM is extremely fast due to the JIT and even all that complexity ends up running faster. If you want to compare Python to Java, use Pypy.

Bob Jordan
  • 3,049
  • 2
  • 34
  • 41
zzzeek
  • 72,307
  • 23
  • 193
  • 185
  • 1
    Thank you for your per-field suggestion and explanations! However, my general problem is that 1M records it not a lot, in fact it is a very small amount considering that we are talking about approximately 4 * (11 bits) * 1M = 5.5 megabyte. Loading that amount of *stuff* into memory is normally really fast, consider for example imread() in Matlab of an image that size, it takes 0.058425 seconds. Your point however was that 'logic' uses the time, however my BareORM example shows that simple logic should also be possible fast. – Herbert Aug 27 '14 at 19:24
  • I think SQLAlchemy is not "Mature, High Performing Architecture" (like they state in their feature list), as it fails to be fast in my simple example. Especially since logic could easily be implemented in straight c(++) and still be part of a python API. I also think SQL fails, at it returns [(A,B)] (list of tuples) instead of [(A,[B])] whenever you do a one-to-many join, but I have not found an ORM that compensates within reasonable time – Herbert Aug 27 '14 at 19:28
  • But fair enough: you do answer the question of why SQLAlchemy is slow, unfortunately I still think it is not reasonable that it is ;) Moreover, my answer shows a small python implementation of a different approach where the one-to-more relations can be merged efficiently using two fetches and a merge on the result. I just don't get why there is no database/ORM combination specialized in efficiently loading lists of objects and then joining them efficiently language-specific and API-wise. – Herbert Aug 27 '14 at 19:29
  • 4
    your comments aren't reasonable because you are expecting Python to perform as fast as straight C code. Comparing it to things like Matlab, Java, etc. are completely unfair comparisons, but again this has little to do with SQLAlchemy. SQLAlchemy is very, very fast. It's just that users tend to be unaware of just how much functionality is being delivered, and confuse an ORM result set with that of a raw database cursor. They are quite different, and SQLAlchemy offers many options for controlling the mixture of "raw" vs. "automated". – zzzeek Aug 27 '14 at 23:53
  • 5
    For the "two fetches, then merge on result", SQLAlchemy offers this: see http://docs.sqlalchemy.org/en/latest/orm/loading.html. SQLAlchemy is the only ORM in the Python world I'm familiar with which offers these kinds of patterns, which is largely b.c. we are the only ORM that stores collections persistently instead of load-on-access every time. – zzzeek Aug 27 '14 at 23:54
  • 3
    If you *are* looking for an ORM that is in Python, yet uses C to a huge degree, see the [storm orm](https://storm.canonical.com/). It doesn't offer any of the eager loading features of SQLAlchemy but from an implementation perspective it is as fast as you're going to get - it's as minimalist as PeeWee but it's entire persistence engine provides a C-based backend. Run your benchmarks against that, and that's the best you're going to do. Or just use Pypy. – zzzeek Aug 28 '14 at 00:00
  • First: creating objects in Python may be slow, but it is not the bottle neck as it only adds 1 second in my BareORM.py example. Second: Your first suggestion is only slightly better than the BareORM.py example as it creates a dict instead of a list, allowing to index by field name instead of index. I guess you loose all benefits from an ORM if you use it like this. Your second suggestion simply states that it is fast as long as you don't actually create the appropriate data structure in memory: which is exactly what an ORM should add on top of the default MySQL API. – Herbert Aug 30 '14 at 08:33
  • The main problem with your solution is that you try to reenact what the BareORM.py example does using SQLAlchemy, which is not what I want. I want ORM-functionality and I want it to be fast. I think the only bottle neck that *should* exist when loading data from a database is that it needs to be transported from disk to memory. Everything else, especially when only dealing with 1M records, should imho be details time wise. Maybe SQL is an inefficient way to store objects or maybe Python is inefficient, but imho these may not be excuses to accept these speed losses. – Herbert Aug 30 '14 at 08:38
  • 2
    so your statement is that, the need to apply a single directive "yield_per(100)", which doubles the speed of the query, is unreasonable. I look forward to your Python ORM implementation that solves these issues in a superior way! Good luck to you. – zzzeek Aug 30 '14 at 14:43
  • That is not at all what I said. If I am not mistaken your first implementation is fast because you retrieve records and not objects, avoiding ORM functionality. Your second implementation avoids actual object creation, if you replace the "pass" by appending the obj to a list, will it still comparably fast? – Herbert Aug 30 '14 at 15:50
  • Also I do think adding a yield_per(100) is a bit strange, how should I determine which is the optimal or a reasonable value for 100? Also, why is doing 10k times 100 things faster than 1 time 1M things? – Herbert Aug 30 '14 at 15:52
2

SQLAlchemy is complicated. It has to deal with converting types to Python which the underlying database does not support natively, tables with inheritance, JOINs, caching the objects, maintaining consistency, translated rows, partial results, and whatnot. Check out sqlalchemy/orm/loading.py:instance_processor -- it's insane.

The solution would be to piece together and compile Python code to process the results of a specific query, like Jinja2 does for templates. So far, nobody has done this work, possibly because the common case is a couple of rows (where this kind of optimization would be pessimal) and people who need to process bulk data do that by hand, like you did.

Matthias Urlichs
  • 2,301
  • 19
  • 29
  • My bare SQL example, also needs to convert the INT(11) to a python integer, I am assuming MySQLdb does that. Furthermore, the time SQLAlchemy needs to figure out inheritance and joins should not scale linearly with the number of results (100k takes ~2sec where 1M takes ~24 sec). Personally, I am not willing to pay speed for inheritance. As you may see, I also tried my example in java, which takes ~8-10 s. If I may, summarized your answer is "No, SQLAlchemy is not fast, if you want speed you need to create object/sql code yourself." – Herbert Apr 21 '14 at 09:26
  • 4
    I like your feedback, but it is not very satisfying when SQLAlchemy claims "Mature, High Performing Architecture" http://www.sqlalchemy.org/features.html , imho it is not high performance if it is 10x slower than bare sql+simple oo. I guess the design decisions to not create schema's and not synthesize python code is fair, but due to the inherent performance decrease, this I think it is unfair to call SQLAlchemy "High performance". Nevertheless, I would be very pleased if someone contradicts me! ;) – Herbert Apr 21 '14 at 09:33
  • Ditto for peewee with regards to type conversions, relevant code: https://github.com/coleifer/peewee/blob/master/peewee.py#L1523-L1658 – coleifer Apr 21 '14 at 13:51
  • The question is, do you need the speed with SQLAlchemy's ORM badly enough to (co-)fund developing a Python code synthesizer for it? Up to now, apparently nobody has. – Matthias Urlichs Apr 21 '14 at 16:05
  • 1
    My point is that SQLAlchemy is not "High performance", and this speed trade-off should be more notable mentioned in the features page. Not doing so tricks people into writing object models in SQLAlchemy, which may not be the best solution at all. I tried looking into writing a small framework which would compile to cpp, but that seems only marginally faster than just using python's dictionary. My approach: join all wanted tables, create itemgetter for all fields and primary keys, create a dict PK->object for each join, 'relate' object using these dicts. I'm not yet sure how fast this will be. – Herbert Apr 28 '14 at 09:25
2

This is not an answer to my question, but may help the general public with speed issues on large data sets. I found that selecting a million records can typically be done in about 3 seconds, however JOINS may slow down the process. In this case that one has approximately 150k Foo's which has a 1-many relation to 1M Bars, then selecting those using a JOIN may be slow as each Foo is returned approximately 6.5 times. I found that selecting both tables seperately and joining them using dicts in python is approximately 3 times faster than SQLAlchemy (approx 25 sec) and 2 times faster than 'bare' python code using joins(approx 17 sec). The code took 8 sec in my use case. Selecting 1M records without relations, like the Bar-example above, took 3 seconds. I used this code:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb
import sys
import time
import datetime
import inspect
from operator import itemgetter, attrgetter

# fetch all objects of class Class, where the fields are determined as the
# arguments of the __init__ constructor (not flexible, but fairly simple ;))
def fetch(Class, cursor, tablename, ids=["id"], where=None):
    arguments = inspect.getargspec(Class.__init__).args; del arguments[0];
    fields = ", ".join(["`" + tablename + "`.`" + column + "`" for column in arguments])
    sql = "SELECT " + fields + " FROM `" + tablename + "`"
    if where != None: sql = sql + " WHERE " + where
    sql=sql+";"
    getId = itemgetter(*[arguments.index(x) for x in ids])
    elements = dict()

    cursor.execute(sql)
    for record in cursor:
        elements[getId(record)] = Class(*record)
    return elements

# attach the objects in dict2 to dict1, given a 1-many relation between both
def merge(dict1, fieldname, dict2, ids):
    idExtractor = attrgetter(*ids)
    for d in dict1: setattr(dict1[d], fieldname, list())
    for d in dict2:
        dd = dict2[d]
        getattr(dict1[idExtractor(dd)], fieldname).append(dd)

# attach dict2 objects to dict1 objects, given a 1-1 relation
def attach(dict1, fieldname, dict2, ids):
    idExtractor = attrgetter(*ids)
    for d in dict1: dd=dict1[d]; setattr(dd, fieldname, dict2[idExtractor(dd)])

It helped me speed up my querying, however I am more than happy to hear from the experts about possible improvements to this approach.

Herbert
  • 5,279
  • 5
  • 44
  • 69