13

I recently inquired about why PatternTest was causing a multitude of needless evaluations: PatternTest not optimized? Leonid replied that it is necessary for what seems to me as a rather questionable method. I can accept that, though I would prefer a more efficient alternative.

I now realize, which I believe Leonid has been saying for some time, that this problem runs much deeper in Mathematica, and I am troubled. I cannot understand why this is not or cannot be better optimized.

Consider this example:

list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing
{0., Null}
{1.014, False}

Checking the heads of the list is essentially instantaneous, yet checking the pattern takes over a second. Surely Mathematica could recognize that since the first element of the list is not an Integer, the pattern cannot match, and unlike the case with PatternTest I cannot see how there is any mutability in the pattern. What is the explanation for this?


There appears to be some confusion regarding packed arrays, which as far as I can tell have no bearing on this question. Rather, I am concerned with the O(n2) time complexity on all lists, packed or unpacked.

Community
  • 1
  • 1
Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125

2 Answers2

8

MatchQ unpacks for these kinds of tests. The reason is that no special case for this has been implemented. In principle it could contain anything.

On["Packing"]
MatchQ[list, {x_Integer, y__}] // Timing

MatchQ[list, {x__Integer, y__}] // Timing

Improving this is very tricky - if you break the pattern matcher you have a serious problem.

Edit 1: It is true that the unpacking is not the cause for the O(n^2) complexity. It does, however, show that for the MatchQ[list, {x__Integer, y__}] part the code goes to another part of the algorithm (which needs the lists to be unpacked). Some other things to note: This complexity arises only if both patterns are __ if either one of them is _ the algorithm has a better complexity.

The algorithm then goes through all n*n potential matches and there seems no early bailout. Presumably because other patters could be constructed that would need this complexity - The issue is that the above pattern forces the matcher to a very general algorithm.

I then was hoping for MatchQ[list, {Shortest[x__Integer], __}] and friends but to no avail.

So, my two cents: either use a different pattern (and have On["Packing"] to see if it goes to the general matcher) or do a pre-check DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer or some such.

  • 2
    +1. Interestingly, there is no unpacking in cases like `MatchQ[list, {__Integer}]`, which I guess is one of the implemented special cases (since it is much more constrained). – Leonid Shifrin Dec 15 '11 at 16:21
  • 1
    I don't think I understand how packing is relevant. Did you look at the question regarding PatternTest? It seems to me the same problem affects this, even though no possibly-changing test function is present. – Mr.Wizard Dec 15 '11 at 16:23
  • @mr.wizard I don't see why you don't see the relevance of the unpacking here. As the unpacking message shows unpacking happens only with the second form whereas for both forms the pattern test could stop after the first element; they are logically the same. For me it shows that unpacking is the culprit, not the matching process. – Sjoerd C. de Vries Dec 15 '11 at 18:55
  • 2
    I withdraw my above statement. Unpacking `list` before doing the match (using `AppendTo[list, 1];`) yields about the same timing, so it really is the matching that's the problem not the unpacking. – Sjoerd C. de Vries Dec 15 '11 at 18:58
  • @Sjoerd one can also manually unpack and time; it's almost instantaneous. – acl Dec 15 '11 at 19:53
  • @acl I did that. See my second comment – Sjoerd C. de Vries Dec 15 '11 at 19:54
  • Since patterns are central to *Mathematica*, shouldn't there be some kind of optimizer, either automatic or like `Dispatch`, that would handle these cases better? I am sure I don't even begin to understand the complexity of the system, but I just cannot see how running through all n*n possible matches is a good idea, when a brief logic check would show it cannot match. Can you help me understand the reason for the current behavior? – Mr.Wizard Dec 16 '11 at 17:28
  • Quite a lot of stuff does go to special case code, this somewhat specific pattern does not have such a special case implementation. I don't think it is that a common pattern. What you describe as a 'brief logic check' is not so easy and no one implemented it so far. Generally speaking, in your opinion, if a special code is not implemented and only a n*n code is available, would you not use it? –  Dec 16 '11 at 17:43
  • 7
    This type of thing is quite tricky to optimize. It is "obvious" that, if tested in a certain order, and if there are some smarts to rule out need for further testing, then there would be potential to short-circuit in this example and make it quite fast. To see how hard that could be to implement, here is an example that really needs to work hard to rule out all possibility of a match. Making it faster than O(n^2) strikes me as unlikely (note to Leonid: no, that isn't a challenge). MatchQ[list, {x__/;List[x][[-1]]==list[[19999]], y__}] // Timing – Daniel Lichtblau Dec 16 '11 at 18:07
  • @Mr.Wizard, all understandable now? –  Dec 17 '11 at 11:43
  • @ruebenko, pardon me, I didn't see your commend until now. Yes, thank you. – Mr.Wizard Dec 18 '11 at 21:06
  • 2
    @Daniel, I guess my point is that in the case of "simple" patterns without `Condition` or `PatternTest`, shouldn't there be an additional logic that kicks in to check for short-circuits? – Mr.Wizard Dec 18 '11 at 21:08
  • @Mr.Wizard Not sure. It might be nice, but there is a limit to resources and little liking for revisiting what is, as best I can tell, working code. – Daniel Lichtblau Dec 19 '11 at 18:40
1

@the author of the first answer. As far as I know from reverse-engeneering and reading of available information, it may be due to different ways the patterns are checked. In fact - as they say - a special hash code is used for pattern matching. This hash (basically a FNV-1 round) makes it very easy to check for particular patterns related to the type of expression involved (matter of a few xor operations). The hashing algorithm cycles inside the expression and each subpart is xorred with the output of the previous one. Special xor values are used for each atom expression - machineInts, machineReals, bigNums, Rationals and so on. Hence, for example, _Integer is easy to check because the hash of any integer is formed with integer's xor value, so all we need to do is doing the inverse op and see if matches - i.e. if we get some particular value or something like that (sorry if I'm vague on actual implementation details. It's WIP). For general or uncommon patterns the check may not take advantage of this hash stuff and require something different.

@the OP Head[] simply acts on the internal expression, taking the value of the first pointer of the expression (expressions are implemented as arrays of pointers). So doing it is as easy as copying and printing a string - very very fast. The pattern matching engine is not even called in this case.

JavierG
  • 201
  • 4
  • 12