7

Given a list of {x,y} datapoints, return a pure function f (from the reals to the reals) such that f[x]==y for every {x,y} in the data. If x is not one of the x-values then return the y-value for the previous point (the one with x-value less than x). If the function gets a value less than that of the first x-value in the data -- i.e., there is no previous point -- then return 0.

For example, given data {{1,20}, {2,10}}, return a pure function that looks like this:

Graph of the function given {{1,20},{2,10}} http://yootles.com/outbox/so/piecewise.png

I wrote something using Function and Piecewise that I'll include as an answer but it seems like it might be inefficient, especially for a large list of points. [UPDATE: My answer may actually be decent now. I'll probably go with it if no one has better ideas.]

To be clear, we're looking for function that takes a single argument -- a list of pairs of numbers -- and returns a pure function. That pure function should take a number and return a number.

dreeves
  • 26,430
  • 45
  • 154
  • 229
  • 5
    Bit of a one sided conversation here... :) – Simon Jul 28 '11 at 10:02
  • 1
    What does *pure* mean, is that a Mathematica concept? – starblue Jul 28 '11 at 10:46
  • 1
    @starblue in this context (i.e., Mathematica programming), it means "nameless" as opposed to "side-effect free". So he wants a function that returns a nameless function. – acl Jul 28 '11 at 12:01
  • 1
    That's right. I've always used "pure function" synonymously with "lambda function": http://stackoverflow.com/questions/16501/what-is-a-lambda-function Looks like the Mathematica docs do too: http://reference.wolfram.com/mathematica/ref/Function.html – dreeves Jul 28 '11 at 15:02
  • 2
    @starblue In Mathematica, a "regular function" can have several definitions, one of which is selected for use by means of pattern matching against the call arguments. A "pure function" has only a single definition and bypasses the pattern matching step. Thus, it can provide a performance improvement in many circumstances. – WReach Jul 29 '11 at 03:35
  • This would be so straightforward with a lisp macro. Trying to do the equivalent in Mathematica always seems to entail a confusing morass of Holds and Evaluates and Releases. See my comment on WReach's answer below. – dreeves Jul 31 '11 at 06:09
  • I stand corrected! All this needed was a With block. See WReach's updated answer. Thanks so much! – dreeves Jul 31 '11 at 16:06

5 Answers5

6

Hand-Coded Binary Search

If one is willing to sacrifice conciseness for performance, then an imperative binary search approach performs well:

stepifyWithBinarySearch[data_] :=
  With[{sortedData = SortBy[data, First], len = Length @ data}
  , Module[{min = 1, max = len, i, x, list = sortedData}
    , While[min <= max
      , i = Floor[(min + max) / 2]
      ; x = list[[i, 1]]
      ; Which[
          x == #, min = max = i; Break[]
        , x < #, min = i + 1
        , True, max = i - 1
        ]
      ]
    ; If[0 == max, 0, list[[max, 2]]]
    ]&
  ]

Equipped with some test scaffolding...

test[s_, count_] :=
  Module[{data, f}
  , data = Table[{n, n^2}, {n, count}]
  ; f = s[data]
  ; Timing[Plot[f[x], {x, -5, count + 5}]]
]

...we can test and time various solutions:

test[stepifyWithBinarySearch, 10]

step plot

On my machine, the following timings are obtained:

test[stepify (*version 1*), 100000]      57.034 s
test[stepify (*version 2*), 100000]      40.903 s
test[stepifyWithBinarySearch, 100000]     2.902 s

I expect that further performance gains could be obtained by compiling the various functions, but I'll leave that as an exercise for the reader.

Better Still: Precomputed Interpolation (response To dreeves' comment)

It is baffling that a hand-coded, uncompiled binary search would beat a Mathematica built-in function. It is perhaps not so surprising for Piecewise since, barring optimizations, it is really just a glorified IF-THEN-ELSEIF chain testing expressions of arbitrary complexity. However, one would expect Interpolation to fare much better since it is essentially purpose-built for this task.

The good news is that Interpolation does provide a very fast solution, provided one arranges to compute the interpolation only once:

stepifyWithInterpolation[data_] :=
  With[{f=Interpolation[
            {-1,1}*#& /@ Join[{{-9^99,0}}, data, {{9^99, data[[-1,2]]}}]
            , InterpolationOrder->0 ]}
    , f[-#]&
  ]

This is blindingly fast, requiring only 0.016 seconds on my machine to execute test[stepifyWithInterpolation, 100000].

Community
  • 1
  • 1
WReach
  • 18,098
  • 3
  • 49
  • 93
  • 1
    Thanks WReach! This is really surprising to me. Why can't a Piecewise or Interpolation function beat doing a hand-made binary search each time? I think there's something very wrong with the other solutions! Like they're reconstructing the pieces on each call of the pure function or something. – dreeves Jul 31 '11 at 06:01
  • @dreeves Your intuition is correct. Please see my updated answer about avoiding the recomputation of the interpolation function. – WReach Jul 31 '11 at 15:12
5

You could also do this with Interpolation (with InterpolationOrder->0) but that interpolates by using the value of the next point instead of the previous. But then I realized you can reverse that with a simple double-negation trick:

stepify[data_] := Function[x,
  Interpolation[{-1,1}*#& /@ Join[{{-9^99,0}}, data, {{9^99, data[[-1,2]]}}],
                InterpolationOrder->0][-x]]
dreeves
  • 26,430
  • 45
  • 154
  • 229
  • I think that you should incorporate the precomputation of the interpolation function from my response and then make your response the accepted answer. – WReach Jul 31 '11 at 15:24
  • Perfect! (I just accepted your answer. Thanks so much for the help!) – dreeves Jul 31 '11 at 16:07
5

My previous attempts were not working properly (they were OK for two steps only).

I think the following one, along the same lines, works:

g[l_] := Function[x, 
  Total[#[[2]] UnitStep[x - #[[1]]] & /@ 
    Transpose@({First@#, Differences[Join[{0}, Last@#]]} &@ Transpose@l)]]

Plot[g[{{1, 20}, {2, 10}, {3, 20}}][x], {x, 0, 6}]

enter image description here

Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
  • Nice! Thanks belisarius. Is the `MapThread[List,..` different from a Transpose here? Also, I see this isn't quite to spec; does it work to wrap this in a `Function[x,...]`? – dreeves Jul 28 '11 at 15:14
  • 1
    @dreeves I did a few Timings (nothing really serious), but seems that your first version is the fastest ... – Dr. belisarius Jul 29 '11 at 02:18
  • 2
    thanks for doing that! Want to clean up your answer and paste in my version that you suspect is fastest and then I can mark this as the Accepted Answer? (My theory of Accepted Answers is that future searchers should not need to look at any other answers, i.e., it's a complete answer, and the best answer, and the asker vouches for it. Oh, and that answerers should borrow from each other liberally to try to construct a definitive answer.) – dreeves Jul 29 '11 at 05:39
  • 1
    @dreeves You asked for _efficiency_, and I provided _elegance_ (at best). So I think yours should be the accepted answer, at least until someone comes up with a faster one. (And I share your POV about borrowing content, with proper attribution of course) – Dr. belisarius Jul 29 '11 at 23:44
  • 1
    I think your version isn't right. I just tried it with {{1,20}, {2,10}, {3,20}} and it doesn't match. I guess I supplied lousy data. :) – dreeves Jul 30 '11 at 08:55
  • I think your answer has the same efficiency problem that WReach just solved. Check out the comments on their answer. My guess now is that the Interpolation trick will be unbeatable. – dreeves Jul 31 '11 at 16:10
3

Compiling WReach's answer does indeed result in a significant speedup. Using all the functions defined in WReach's answer, but redefining test to

test[s_,count_]:=Module[{data,f},
    data=Table[{n,n^2},
        {n,count}];
        f=s[ToPackedArray[N@data]];
        Timing[Plot[f[x],{x,-5,count+5}]]]

(this is necessary to force the resulting arrays to be packed; thanks to Sjoerd de Vries for pointing this out), and defining

ClearAll[stepifyWRCompiled];
stepifyWRCompiled[data_]:=With[{len=Length@data,sortedData=SortBy[data,First]},
Compile[{{arg,_Real}},Module[{min=1,max=len,i,x,list=sortedData},
            While[
                min<=max,
                i=Floor[(min+max)/2];
                    x=list[[i,1]];
                    Which[
                        x\[Equal]arg,min=max=i;Break[],
                        x<arg,min=i+1,True,max=i-1
                    ]
            ];
            If[0==max,0,list[[max,2]]]
        ],CompilationTarget->"WVM",RuntimeOptions\[Rule]"Speed"]]

(the With block is necessary to explicitly insert sortedData into the block of code to be compiled) we get results faster even than the solution using Interpolation, although only marginally so:

Monitor[
tbl = Table[
    {test[stepifyWRCompiled, l][[1]],
        test[stepifyWithInterpolation, l][[1]],
        test[stepifyWithBinarySearch, l][[1]]},
        {l, 15000, 110000, 5000}], l]
tbl//TableForm
(*
0.002785    0.003154    0.029324
0.002575    0.003219    0.031453
0.0028      0.003175    0.034886
0.002694    0.003066    0.034896
0.002648    0.003002    0.037036
0.00272     0.003019    0.038524
0.00255     0.00325     0.041071
0.002675    0.003146    0.041931
0.002702    0.003044    0.045077
0.002571    0.003052    0.046614
0.002611    0.003129    0.047474
0.002604    0.00313     0.047816
0.002668    0.003207    0.051982
0.002674    0.00309     0.054308
0.002643    0.003137    0.05605
0.002725    0.00323     0.06603
0.002656    0.003258    0.059417
0.00264     0.003029    0.05813
0.00274     0.003142    0.0635
0.002661    0.003023    0.065713
*)

(first column is compiled binary search, second interpolation, third, uncompiled binary search).

Note also that I use CompilationTarget->"WVM", rather than CompilationTarget->"C"; this is because the function is compiled with a lot of data points "built-in", and, if I use compilation to C with 100000 data points, I can see that gcc goes on for a long time and takes up a lot of memory (I imagine the resulting C file is huge, but I did not check). So I just use compilation to "WVM".

I think the overall conclusion here is just that Interpolation is also just doing some constant-time lookup (binary search or something similar, presumably) and the hand-coded way just happens to be slightly faster because it's less general.

Community
  • 1
  • 1
acl
  • 6,490
  • 1
  • 27
  • 33
  • I think this has to do with the data array not being packed anymore at that size. Try Needs["Developer`"] DiscretePlot[ PackedArrayQ[Table[{n, n^2}, {n, 1, count}]] // Boole, {count, 5000, 100000, 5000}] DiscretePlot[ PackedArrayQ[Table[{n, n^2}, {n, 1, count}]] // Boole, {count, 100, 500, 20}] – Sjoerd C. de Vries Jul 31 '11 at 20:39
  • @Sjoerd hmm, good point. But even if I explicitly use `ToPackedArray[data]` instead of `data`, I see the same slowdown. Still, it must be something like what you say (eg maybe it just gets automatically unpacked). I'll try to check this tomorrow. – acl Jul 31 '11 at 20:57
  • PackedArrayQ[Table[{n, n^2}, {n, 1, 46340}]] ==> True, but PackedArrayQ[Table[{n, n^2}, {n, 1, 46341}]] ==> False. 46341 is the square root of 2^31, so its squared value cannot be contained in a machine size signed integer. I assume that's the reason. – Sjoerd C. de Vries Jul 31 '11 at 20:58
  • Change test to: test[s_, count_] := Module[{data, f}, data = Table[{n, n^2} // N, {n, count}] // ToPackedArray; f = s[data // ToPackedArray]; Timing[Plot[f[x], {x, -5, count + 5}]]] and it works, if I'm not mistaken (added both //N and //PackedArray) – Sjoerd C. de Vries Jul 31 '11 at 21:00
  • @Sjoerd yes it finally dawned on me that this is probably what's happening just as you were posting it! Spending too long in mma makes me forget about all these "details". Thanks! – acl Jul 31 '11 at 21:00
  • you want PackedArray to work on reals and not integers as it doesn't have an effect on non-machine sized integers. – Sjoerd C. de Vries Jul 31 '11 at 21:04
  • @Sjoerd it indeed does not but, never having used it on integers, I trusted the documentation, which says that `ToPackedArray[expr,type]` is allowed, "Possible types are: Integer, Real and Complex". So I tried this form but it did nothing. Anyway, your approach works. – acl Jul 31 '11 at 21:10
  • When you say compiled binary search is faster than interpolation do you mean the version in my answer or WReach's version with the With block? (I had predicted that the interpolation version would be unbeatable after that.) – dreeves Jul 31 '11 at 21:28
  • @dreeves I am referring to WReach's version – acl Jul 31 '11 at 21:32
  • Two bullets above the doc line you cite the doc says: ToPackedArray will successfully pack full lists of any depth containing **machine-sized integers** and machine-sized approximate real and complex numbers. (I have to admit that until now I happened to have missed this) – Sjoerd C. de Vries Jul 31 '11 at 21:35
  • @Sjoerd Yes, I simply read the line I quoted and got misled. Happens often. – acl Jul 31 '11 at 21:38
  • @Sjoerd I guess the real question here is, how come this is faster? Interpolation functions should just be glorified if-elif structures, as WReach points out. – acl Jul 31 '11 at 22:48
  • As you said, the difference is marginal. It's about 10%. I assume finding the points to interpolate between is also done with a binary search or something alike. The overhead in the Interpolation, for instance the possibility to specify the interpolation order, may explain the difference. – Sjoerd C. de Vries Jul 31 '11 at 23:49
  • @Sjoerd the timing difference is between pre-computed interpolations and binary search so it can't really be specifically the ability to change interpolation order (this never comes in; we're comparing speeds between computed interpolation objects and the compiled binary tree search). but yes, you're right, it look like just overhead; they both seem constant time. – acl Aug 01 '11 at 00:12
  • `Interpolation` doesn't really involve much pre-computation; probably only sorting. Try, for instance, this: Interpolation[{{1, 2}, {2, 5}, {3, 8}, {5, 12}}, InterpolationOrder -> 2] // FullForm You'll see that it's just an inert wrapper around the original set of data points and a tiny bit of meta-information. Interpolations are calculated on the fly and I have a hunch that this, among other things, involves a binary search. – Sjoerd C. de Vries Aug 01 '11 at 06:43
2

The following works:

stp0[x_][{{x1_,y1_}, {x2_,y2_}}] := {y1, x1 <= x < x2}
stepify[{}] := (0&)
stepify[data_] := With[{x0 = data[[1,1]], yz = data[[-1,2]]},
  Function[x, Piecewise[Join[{{0, x<x0}}, stp0[x] /@ Partition[data, 2,1]], yz]]]

Note that without the With it will leave things like {{1,10},{2,20}}[[1,1]] in the returned function, which seems a little wasteful.

By the way, I decided to call this stepify since it turns a list of points into a step function.

dreeves
  • 26,430
  • 45
  • 154
  • 229