costly computation occuring in both isDefined and Apply of a PartialFunction

Question

It is quite possible that to know whether a function is defined at some point, a significant part of computing its value has to be done. In a PartialFunction, when implementing isDefined and apply, both methods will have to do that. What to do is this common job is costly?

There is the possibility of caching its result, hoping that apply will be called after isDefined. Definitely ugly.

I often wish that PartialFunction[A,B] would be Function[A, Option[B]], which is clearly isomorphic. Or maybe, there could be another method in PartialFunction, say applyOption(a: A): Option[B]. With some mixins, implementors would have a choice of implementing either isDefined and apply or applyOption. Or all of them to be on the safe side, performance wise. Clients which test isDefined just before calling apply would be encouraged to use applyOption instead.

However, this is not so. Some major methods in the library, among them collect in collections require a PartialFunction. Is there a clean (or not so clean) way to avoid paying for computations repeated between isDefined and apply?

Also, is the applyOption(a: A): Option[B] method reasonable? Does it sound feasible to add it in a future version? Would it be worth it?

See also http://stackoverflow.com/questions/4064859/is-the-partialfunction-design-inefficient — Knut Arne Vedaa, Aug 18 '11 at 21:58

score 4 · Accepted Answer · answered Aug 18 '11 at 21:15

4

Why is caching such a problem? In most cases, you have a local computation, so as long as you write a wrapper for the caching, you needn't worry about it. I have the following code in my utility library:

  class DroppedFunction[-A,+B](f: A => Option[B]) extends PartialFunction[A,B] {
    private[this] var tested = false
    private[this] var arg: A = _
    private[this] var ans: Option[B] = None
    private[this] def cache(a: A) {
      if (!tested || a != arg) {
        tested = true
        arg = a
        ans = f(a)
      }
    }        
    def isDefinedAt(a: A) = {
      cache(a)
      ans.isDefined
    }
    def apply(a: A) = {
      cache(a)
      ans.get
    }
  }
  class DroppableFunction[A,B](f: A => Option[B]) {
    def drop = new DroppedFunction(f)
  }
  implicit def function_is_droppable[A,B](f: A => Option[B]) = new DroppableFunction(f)

and then if I have an expensive computation, I write a function method A => Option[B] and do something like (f _).drop to use it in collect or whatnot. (If you wanted to do it inline, you could create a method that takes A=>Option[B] and returns a partial function.)

(The opposite transformation--from PartialFunction to A => Option[B]--is called lifting, hence the "drop"; "unlift" is, I think, a more widely used term for the opposite operation.)

answered Aug 18 '11 at 21:15

Rex Kerr

166,841
26
322
407

1

I'd try to use reference equality if possible. If the cost of testing is high, the cost of comparing might be high as well. – Daniel C. Sobral Aug 19 '11 at 00:52
@Daniel - Good point; one could test `(a ne arg) && a != arg` to shortcut the test in case the same reference was passed again. I have only rarely used this for expensive computations; normally it's because I've got an `A=>Option[B]` and want to use it in a partial function. – Rex Kerr Aug 19 '11 at 03:05
Well, it would have to be slightly more complicated, as reference equality can only be applied over `AnyRef` objects. – Daniel C. Sobral Aug 19 '11 at 03:45
@Daniel - Also a good point. Making it work is left as an exercise to the reader :) – Rex Kerr Aug 19 '11 at 04:16
How about calling it *flatened* instead of *dropped*? This seems to make more sense. – ziggystar Aug 19 '11 at 07:20
Ok, maybe there is no better way. I would have had an Option[(A, Option[B])] instead of three vars. There is the problem of thread safety. With just one var, @volatile might be enough. Anyway it might not work too well with parallel collections, with lot of cache misses. – Didier Dupont Aug 19 '11 at 09:40
@ziggystar - Flattening refers to the operation `L(L(A),L(B)) -> L(A,B)`, where `L` is a collection. – Rex Kerr Aug 19 '11 at 14:07
@didierd - Who cares if it's three vars? They're all private[this] and therefore invisible to everyone. You have check-and-then-update which you can't accomplish with @volatile; you need synchronized anyway, so just synchronize the whole block. But then caching is dangerous anyway, because threads will compete with each other for the cached value. You really want one instance per thread. – Rex Kerr Aug 19 '11 at 14:12
@Rex Kerr. This more or less answers the "why is caching such a problem". But again, there might be no better solution. I would have thought volatile ok, all I really need is that the Option(A, Option[B]) I get is consistent, don't mind if sometimes I do not get the last one. Besides that, three vars is not a huge problem, just my hunch. A ThreadLocal/DynamicVariable -has to be just one then- is indeed another solution to the threading problem – Didier Dupont Aug 19 '11 at 16:58

score 3 · Answer 2 · answered Aug 19 '11 at 00:06

3

Have a look at this thread, Rethinking PartialFunction. You're not the only one wondering about this.

answered Aug 19 '11 at 00:06

Alex Cruise

7,939
1
27
40

Had a look. Tough. Well, that answer my applyOption proposal. Creation of an Option too expansive when the function is very fast. Thanks. – Didier Dupont Aug 20 '11 at 16:40

score 2 · Answer 3 · answered Aug 18 '11 at 21:34

This is an interesting question, and I'll give my 2 cents.

First of resist the urge for premature optimization. Make sure the partial function is the problem. I was amazed at how fast they are on some cases.

Now assuming there is a problem, where would it come from?

Could be a large number of case clauses
Complex pattern matching
Some complex computation on the if causes

One option I'd try to find ways to fail fast. Break the pattern matching into layer, then chain partial functions. This way you can fail the match early. Also extract repeated sub matching. For example:

Lets assume OddEvenList is an extractor that break a list into a odd list and an even list:

var pf1: PartialFuntion[List[Int],R] = { 
   case OddEvenList(1::ors, 2::ers) =>
   case OddEvenList(3::ors, 4::ors) => 
}

Break to two part, one that matches the split then one that tries to match re result (to avoid repeated computation. However this may require some re-engineering

var pf2: PartialFunction[(List[Int],List[Int],R) = {
   case (1 :: ors, 2 :: ers) => R1
   case (3 :: ors, 4 :: ors) => R2
}
var pf1: PartialFuntion[List[Int],R] = { 
   case OddEvenList(ors, ers) if(pf2.isDefinedAt(ors,ers) => pf2(ors,ers)
}

I have used this when progressively reading XML files that hard a rather inconstant format.

Another option is to compose partial functions using andThen. Although a quick test here seamed to indicate that only the first was is actually tests.

It had not patterns in mind, but ad hoc partial function. Yet the same problem may happen with patterns too, and indeed most of the time it might not. Thanks for the remark on andThen, I had not paid attention to that. Indeed, it has a Function, not PartialFunction parameter. And calling both isDefined would imply calling apply on the first one. That could be a great example of costy common computation. — Didier Dupont, Aug 19 '11 at 10:25

paradigmatic · Answer 4 · 2011-08-18T22:46:44.380

0

There is absolutely nothing wrong with caching mechanism inside the partial function, if:

the function returns always the same input, when passed the same argument
it has no side effects
it is completely hidden from the rest of the world

Such cached function is not distiguishable from a plain old pure partial function...

edited Aug 18 '11 at 22:46

answered Aug 18 '11 at 22:34

paradigmatic

40,153
18
88
147

costly computation occuring in both isDefined and Apply of a PartialFunction

4 Answers4

Linked