pyDatalog: handling unbound variables in a custom predicate

Question

I'm writing a pyDatalog program to analyse weather data from Weather Underground (just as a demo for myself and others in the company at the moment). I have written a custom predicate resolver which returns readings between a start and end time:

# class for the reading table.
class Reading(Base):
      __table__ = Table('reading', Base.metadata, autoload = True, autoload_with = engine)
      def __repr__(self):
        return str(self.Time)
      # predicate to resolve 'timeBetween(X, Y, Z)' statements
      # matches items as X where the time of day is between Y and Z (inclusive).
      # if Y is later than Z, it returns the items not between Z and Y (exclusive).
      # TODO - make it work where t1 and t2 are not bound.
      # somehow needs to tell the engine to try somewhere else first.
      @classmethod
      def _pyD_timeBetween3(cls, dt, t1, t2):
        if dt.is_const():
          # dt is already known
          if t1.is_const() and t2.is_const():
            if (dt.id.Time.time() >= makeTime(t1.id)) and (dt.id.Time.time() <= makeTime(t2.id)):
              yield (dt.id, t1.id, t2.id)
        else:
          # dt is an unbound variable
          if t1.is_const() and t2.is_const():
            if makeTime(t2.id) > makeTime(t1.id):
              op = 'and'
            else:
              op = 'or'
            sqlWhere = "time(Time) >= '%s' %s time(Time) <= '%s'" % (t1.id, op, t2.id)
            for instance in cls.session.query(cls).filter(sqlWhere):
              yield(instance, t1.id, t2.id)

This works fine in the case where t1 and t2 are bound to specific values:

:> easterly(X) <= (Reading.WindDirection[X] == 'East')
:> + rideAfter('11:00:00')
:> + rideBefore('15:00:00')
:> goodTime(X) <= rideAfter(Y) & rideBefore(Z) & Reading.timeBetween(X, Y, Z)
:> goodTime(X)
[(2013-02-19 11:25:00,), (2013-02-19 12:45:00,), (2013-02-19 12:50:00,), (2013-02-19  13:25:00,), (2013-02-19 14:30:00,), (2013-02-19 15:00:00,), (2013-02-19 13:35:00,), (2013-02-19 13:50:00,), (2013-02-19 12:20:00,), (2013-02-19 12:35:00,), (2013-02-19 14:05:00,), (2013-02-19 11:20:00,), (2013-02-19 11:50:00,), (2013-02-19 13:15:00,), (2013-02-19 14:55:00,), (2013-02-19 12:00:00,), (2013-02-19 13:00:00,), (2013-02-19 14:20:00,), (2013-02-19 14:15:00,), (2013-02-19 13:10:00,), (2013-02-19 12:10:00,), (2013-02-19 14:45:00,), (2013-02-19 14:35:00,), (2013-02-19 13:20:00,), (2013-02-19 11:10:00,), (2013-02-19 13:05:00,), (2013-02-19 12:55:00,), (2013-02-19 14:10:00,), (2013-02-19 13:45:00,), (2013-02-19 13:55:00,), (2013-02-19 11:05:00,), (2013-02-19 12:25:00,), (2013-02-19 14:00:00,), (2013-02-19 12:05:00,), (2013-02-19 12:40:00,), (2013-02-19 14:40:00,), (2013-02-19 11:00:00,), (2013-02-19 11:15:00,), (2013-02-19 11:30:00,), (2013-02-19 11:45:00,), (2013-02-19 13:40:00,), (2013-02-19 11:55:00,), (2013-02-19 14:25:00,), (2013-02-19 13:30:00,), (2013-02-19 12:30:00,), (2013-02-19 12:15:00,), (2013-02-19 11:40:00,), (2013-02-19 14:50:00,), (2013-02-19 11:35:00,)]

However if I declare the goodTime rule with the conditions in the other order (i.e. where Y and Z are unbound at the point it tries to resolve timeBetween), it returns an empty set:

:> atoms('niceTime')
:> niceTime(X) <= Reading.timeBetween(X, Y, Z) & rideAfter(Y) & rideBefore(Z)
<pyDatalog.pyEngine.Clause object at 0x0adfa510>
:> niceTime(X)
[]

This seems wrong - the two queries should return the same set of results.

My question is whether there is a way of handling this situation in pyDatalog? I think what needs to happen is that the timeBetween predicate should be able to tell the engine to back off somehow and try to resolve other rules first before trying this one, but I can't see any reference to this in the docs.

score 0 · Accepted Answer · answered Mar 04 '13 at 12:02

0

The pyDatalog reference says : "although the order of pyDatalog statements is indifferent, the order of literals within a body is significant" pyDatalog does resolve predicates in a body in the order they are stated.

Having said that, it would be possible to improve pyDatalog to resolve predicates with bound variables first, but I'm not sure why this would be important.

answered Mar 04 '13 at 12:02

Pierre Carbonnelle

2,305
19
25

The only reason for me would be to make the syntax more transparent and independent of the underlying engine. Normally '&' is a transitive operator, so people would expect that coming from other languages. – highfellow Mar 05 '13 at 10:21
Thanks for the feedback. At some point, I considered using the following syntax for clauses: p(X) <= (q(X), r(X)) i.e. using a list of body literals instead of &. This has the advantage of not implying commutativity, but I find it less readable than '&'. Following your feedback, I may add this notation at some point though. Please note that 'and' is not truly commutative in Python. if a is false, b is not evaluated in 'a and b', so 'a and b' may have different result than 'b and a'. – Pierre Carbonnelle Mar 11 '13 at 15:48

pyDatalog: handling unbound variables in a custom predicate

1 Answers1