For some graph algorithm I need to fetch a lot of records from a database to memory (~ 1M records). I want this to be done fast and I want the records to be objects (that is: I want ORM). To crudely benchmark different solutions I created a simple problem of one table with 1M Foo objects like I did here: Why is loading SQLAlchemy objects via the ORM 5-8x slower than rows via a raw MySQLdb cursor? .
One can see that fetching them using bare SQL is extremely fast; also converting the records to objects using a simple for-loop is fast. Both execute in around 2-3 seconds. However using ORM's like SQLAlchemy and Hibernate, this takes 20-30 seconds: a lot slower if you ask me, and this is just a simple example without relations and joins.
SQLAlchemy gives itself the feature "Mature, High Performing Architecture," (http://www.sqlalchemy.org/features.html). Similarly for Hibernate "High Performance" (http://hibernate.org/orm/). In a way both are right, because they allow for very generic object oriented data models to be mapped back and forth to a MySQL database. On the other hand they are awfully wrong, since they are 10x slower than just SQL and native code. Personally I think they could do better benchmarks to show this, that is, a benchmark comparing with native SQL + java or python. But that is not the problem at hand.
Of course, I don't want SQL + native code, as it is hard to maintain. So I was wondering why there does not exist something like an object oriented database, which handles the database->object mapping native. Someone suggested OrientDB, hence I tried it. The API is quite nice: when you have your getters and setters right, the object is insertable and selectable.
But I want more than just API-sweetness, so I tried the 1M example:
import java.io.Serializable;
public class Foo implements Serializable {
public Foo() {}
public Foo(int a, int b, int c) { this.a=a; this.b=b; this.c=c; }
public int a,b,c;
public int getA() { return a; }
public void setA(int a) { this.a=a; }
public int getB() { return b; }
public void setB(int b) { this.b=b; }
public int getC() { return c; }
public void setC(int c) { this.c=c; }
}
import com.orientechnologies.orient.object.db.OObjectDatabaseTx;
public class Main {
public static void insert() throws Exception {
OObjectDatabaseTx db = new OObjectDatabaseTx ("plocal:/opt/orientdb-community-1.7.6/databases/test").open("admin", "admin");
db.getEntityManager().registerEntityClass(Foo.class);
int N=1000000;
long time = System.currentTimeMillis();
for(int i=0; i<N; i++) {
Foo foo = new Foo(i, i*i, i+i*i);
db.save(foo);
}
db.close();
System.out.println(System.currentTimeMillis() - time);
}
public static void fetch() {
OObjectDatabaseTx db = new OObjectDatabaseTx ("plocal:/opt/orientdb-community-1.7.6/databases/test").open("admin", "admin");
db.getEntityManager().registerEntityClass(Foo.class);
long time = System.currentTimeMillis();
for (Foo f : db.browseClass(Foo.class).setFetchPlan("*:-1")) {
if(f.getA() == 345234) System.out.println(f.getB());
}
System.out.println("Fetching all Foo records took: " + (System.currentTimeMillis() - time) + " ms");
db.close();
}
public static void main(String[] args) throws Exception {
//insert();
fetch();
}
}
Fetching 1M Foo's using OrientDB takes approximately 18 seconds. The for-loop with the getA() is to force the object fields to be actually loaded into memory, as I noticed that by default they are fetched lazily. I guess this may also be the reason fetching the Foo's is slow, because it has db-access each iteration instead of db-access once when it fetches everything (including the fields).
I tried to fix that using setFetchPlan("*:-1"), I figured it may also apply on fields, but that did not seem to work.
Question: Is there a way to do this fast, preferably in the 2-3 seconds range? Why does this take 18 seconds, whilst the bare SQL version uses 3 seconds?
Addition: Using a ODatabaseDocumentTX like @frens-jan-rumph suggested only gave ma a speedup of approximately 5, but of approximatelt 2. Adjusting the following code gave me a running time of approximately 9 seconds. This is still 3 times slower than raw sql whilst no conversion to Foo's was executed. Almost all time goes to the for-loop.
public static void fetch() {
ODatabaseDocumentTx db = new ODatabaseDocumentTx ("plocal:/opt/orientdb-community-1.7.6/databases/pits2").open("admin", "admin");
long time = System.currentTimeMillis();
ORecordIteratorClass<ODocument> it = db.browseClass("Foo");
it.setFetchPlan("*:0");
System.out.println("Fetching all Foo records took: " + (System.currentTimeMillis() - time) + " ms");
time = System.currentTimeMillis();
for (ODocument f : it) {
//if((int)f.field("a") == 345234) System.out.println(f.field("b"));
}
System.out.println("Iterating all Foo records took: " + (System.currentTimeMillis() - time) + " ms");
db.close();
}