I'm currently evaluating Datomic for the use-case of storing and querying parsed symbols that form an ontology. In total there are 225122 symbols (entities) in the database (so it's a rather big ontology, but shouldn't be a big deal for a DB).
The structure is pretty standard, symbols have
- parent symbols that contain them (like in sub-symbol etc)
- supersymbols (symbols they inherit from)
To have nice access to the symbols, we have a unique name
for each symbol. This adds up to the following Datomic schema:
[{:db/ident :ml/name,
:db/valueType :db.type/string,
:db/cardinality :db.cardinality/one,
:db/unique :db.unique/identity}
{:db/ident :ml/parent,
:db/valueType :db.type/ref,
:db/index true,
:db/cardinality :db.cardinality/one}
{:db/ident :ml/superclass,
:db/valueType :db.type/ref,
:db/index true,
:db/cardinality :db.cardinality/one}]
Now I have the most basic recursive query "give me all symbols that are (transitively) contained in symbol p
". In Datomic terms:
(def rules
'[
[(ubersymbol ?c ?p) (?c :ml/parent ?p)]
[(ubersymbol ?c ?p) (?c :ml/parent ?c1) (ubersymbol ?c1 ?p) ]
])
(q '[:find ?c ?n :in $ % :where
(ubersymbol ?c ?d) [?d :ml/name "name of a root symbol"] [?c :ml/name ?n]]
current-db rules)
The query itself (so a mid-sized symbol) takes between 5 and 5.5 seconds and returns 80 hits. Not milliseconds, but real seconds. And this is only the most basic query I want to ask about the dataset (it's intended to be used from a web-tool to help modellers understand the structure of the ontology).
I'm running datomic-pro-0.9.5554
, with a memory database and use the peer library (I started the server as described in the "getting started" guide.
Help is really appreciated to make a case for Datomic.
markus