2

Imagine I had actors and movies. How can I write a single query that for a given list of actors returns a list of 5-tuples of five most recent movies the actor participated in (sorted descending by movie date)?

More specifically: given a list of :db/id called actors and models are as follows:

Actor:

:db/id
:actor/name str
:actor/movie ref

Movie:

:db/id
:db/name str
:db/date inst

I want to write a query like:

(d/q '[:find ?actor ???????
       :in $ [?actor ...]
       :where ??????????] snapshot actors)

Expected results:

[[1 [2 3 4 5 6]
  7 [8 9 10 11 12]]

Where 1 and 7 are actor ids and 2,3,4,5,6,8,9,10,11,12 are movie ids.

Now, I have a strong feeling that such a query cannot be constructed. If I am right, how can I get this information in chunks (imagine that each actor has tons and tons of movies they were cast in, too many to fit in memory)?

Chang
  • 435
  • 1
  • 8
  • 17
Terminus
  • 925
  • 10
  • 23
  • I think the answer you're looking for is here: [How to sort result in a Datalog query](https://stackoverflow.com/questions/29621159/how-to-sort-result-in-a-datalog-query) – rriehle Feb 07 '19 at 07:54
  • Thanks! Unfortunately this is not the case. – Terminus Feb 08 '19 at 08:05

1 Answers1

1

This is a general problem - if you have too much data to fit in memory, then maybe something like map/reduce could work better. That much data is also quite difficult to sort - how do you sort something without having everything that is sorted in memory at the same time? Sorting in chunks is not something that maps well with reality...

The general approach is what Richard Riehle links to in your comment - sort the output by hand.

It could also help if you don't use the query to pull out entities, but only query for, say, the entity id and the value you want to sort on. That way, Datomic doesn't need to pull chunks for all of your data into the peer. You can instead pull out the data you need after sorting the result of your relatively sparse query.

Another thing that could help here is to use a separate partition for the attribute you need to sort on. That way, you make sure that the chunks that Datomic has to pull to get the sparse data for sorting only contains data for the attribute you're sorting on.

August Lilleaas
  • 54,010
  • 13
  • 102
  • 111