4

I have a table of the following form:

[mytable]
id,    min,     max,    funobj
-----------------------------------------
1      15       23      {some big object}
1      23       41      {another big object}
1      19       27      {next big object}

Now suppose I have a view created like this:

CREATE VIEW functionvalues AS
SELECT id, evaluate(funobj)
FROM mytable

where evaluate is a set-returning function evaluateing the large funobj. The result of the view could be something like this:

id,  evaluate
--------------
1    15
1    16
1    ...
1    23
2    23
2    24
2    ...
2    41
...

I do not have any information on the specific values evaluate will return, but I know that they will alway be between the min- and max-values given in mytable (including boundaries)

Finally, I (or better, a third party application) makes a query on the view:

SELECT * FROM functionvalues 
WHERE evaluate BETWEEN somevalue AND anothervalue

In this case, Postgres does evaluate the function evaluate for every row in mytable, whereas, depending on the where clause, the function does not have to be evaluated if it's max and min are not between the given values. As evaluate is a rather slow function, this gives me a very bad performance.

The better way would be to query the table directly by using

SELECT *
FROM (
    SELECT id, evaluate(funobj)
    FROM mytable
    WHERE
       max BETWEEN somevalue AND anothervalue
       OR min BETWEEN somevalue AND anothervalue
       OR (min < somevalue AND max > anothervalue)
) AS innerquery
WHERE evaluate BETWEEN somevalue AND anothervalue

Is there any way I can tell postgres to use a query like the above (by clever indices or something like that) without changing the way the third party application queries the view?

P.S.: Feel free to suggest a better title to this question, the one I gave is rather... well... unspecific.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
Thilo
  • 8,827
  • 2
  • 35
  • 56

3 Answers3

1

I have no complete answer, but some of your catchwords ring a distant bell in my head:

  • you have a view
  • you want a more intelligent view
  • you want to "rewrite" the view definition

That calls for the PostgreSQL Rule System, especially the part "Views and the Rules System". Perhaps you can use that for your advantage.

Be warned: This is treacherous stuff. First you will find it great, then you will pet it, then it will rip of your arm without a warning while still purring. Follow the links in here.

Community
  • 1
  • 1
A.H.
  • 63,967
  • 15
  • 92
  • 126
  • And, I think, it's not going to last. I think I read on one of the PostgreSQL mailing lists that the long-term plan is to replace rules with triggers, but I don't have a reference for that. – Mike Sherrill 'Cat Recall' Oct 04 '11 at 22:24
  • That must be a _very_ long term plan, since the rule system has been invented to support views and triggers cannot emulate views. The INSERT, UPDATE and DELETE support is under heavy fire for quite some time though due to the erratic behaviour. That was the status some when I was looking into rules to solve a problem some time ago. Besides: Given the ultra conservative stance of the PostgreSQL team regarding cutting existing features without urgent need - I won't hold my breath but raise a child perhaps ;-) – A.H. Oct 04 '11 at 22:45
  • PostgreSQL 9.1 supports triggers on views. (That makes views updatable without having to use rules.) The beginning of the end? I don't know. But you said yourself that they're treacherous stuff. And I understand that's the rationale. – Mike Sherrill 'Cat Recall' Oct 05 '11 at 00:36
  • +1 for pointing me to something I did not know yet. However, Rules will as far as I can see not be the solution to my problem, as I would have to write conditional rules on SELECT-queries. The rule system does not allow this, however. – Thilo Oct 05 '11 at 06:29
0

Postgres cannot push the restrictions down the query tree into the function; the function always has to scan and return the entire underlying table. And rejoin it with the same table. sigh. "Breaking up" the function's body and combining it with the rest of the query would require a macro-like feature instead of a function.

A better way would probably be to not use an unrestricted set-returning function, but to rewrite the function as a a scalar function, taking only one data row as an argument, and yielding its value.

There is also the problem of sorting-order: the outer query does not know about the order delivered by the function, so explicit sort and merge steps will be necessary, except maybe for very small result sets (for function results statistics are not available, only the cost and estimated rowcount, IIRC.)

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • Thanks for your reply. I feared that the answer would be along that lines. Rewriting the function to return a single argument is sadly not possible for this case, as the function values determine the next position it has to be evaluated. – Thilo Oct 04 '11 at 10:46
  • Please note that I am not an expert on this, I only have been bitten by this, too; please wait for the other replies. Tom Lane can probably answer this question better in one of the pg forums or mailing lists. NB: always inspect the resulting query plans... – wildplasser Oct 04 '11 at 10:50
  • By "inspect the next position" you mean that the function needs more than one row of data to compute its result. That can always be rewritten to a subquery-form, in most cases a (NOT) EXISTS kind of thing. Which will probably perform better than "sorting and relying on order" in the subquery or function. – wildplasser Oct 05 '11 at 10:25
  • Sorry, probably I was a bit sloppy on formulation. Think of evaluate(...) as a function that evaluates some (mathematical) function f. At first f(min) is evaluated, then, depending on that result, the next value of f() is evaluated, say f(min+something). While it might be possible to archieve this using subqueries, I doubt it will work as efficient as the current solution. – Thilo Oct 05 '11 at 11:11
0

Sometimes, the right answer is "faster hardware". Given the way the PostgreSQL optimizer works, your best bet might be to move the table and its indexes onto a solid state disk.

Documentation for Tablespaces

Second, tablespaces allow an administrator to use knowledge of the usage pattern of database objects to optimize performance. For example, an index which is very heavily used can be placed on a very fast, highly available disk, such as an expensive solid state device. At the same time a table storing archived data which is rarely used or not performance critical could be stored on a less expensive, slower disk system.

In October, 2011, you can get a really good 128 gig SSD drive for less than $300, or 300 gigs for less than $600.

If you're looking for an improvement of two orders of magnitude, and you already know the bottleneck is your evaluate() function, then you will probably have to accept smaller gains from many sources. If I were you, I'd look at whether any of these things might help.

  • solid-state disk (speedup factor of 50 over HDD, but you say you're not IO bound, so let's estimate "2")
  • faster CPU (speedup of 1.2)
  • more RAM (speedup of 1.02)
  • different algorithms in evaluate() (say, 2)
  • different data structures in evaluate() (in a database function, probably 0)
  • different C compiler optimizations (0)
  • different C compiler (1.1)
  • rewrite critical parts of evaluate() in assembler (2.5)
  • different dbms platform
  • different database technology

Those estimates suggest a speedup by a factor of only 13. (But they're little more than guesswork.)

I might even consider targeting the GPU for calculation.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • Thanks for your suggestion. However, while the tables to fetch are rather big, the real bottleneck is the evaluate function. This function does quite a lot of math, is highly optimized yet still rather slow. A faster processor might help a bit, of course. But I doubt it will yield a speed gain of factor 100, which would satisfy me only partly... – Thilo Oct 04 '11 at 14:12
  • I see. What language is evaluate() written in? – Mike Sherrill 'Cat Recall' Oct 04 '11 at 14:38
  • Thanks again for your update. While performance of the functions like evaluate() is one of my major topics right now (some of your suggestions have been done a long time ago...), this will conceptually never solve the problem of this question: I have a view that does only have to look at maybe 0.1% of all rows in the table, however, evaluate() is called for all 100% of the rows. Speeding up evaluate is bound to an upper limit (probably far less than factor 1000, as your reasoning shows) – Thilo Oct 05 '11 at 06:36