Often we need to process data consisting of a list of coordinates: data = {{x1,y1}, {x2,y2}, ..., {xn,yn}}
. It could be 2D or 3D coordinates, or any other arbitrary length list of fixed length small vectors.
Let me illustrate how to use Compile
for such problems using the simple example of summing up a list of 2D vectors:
data = RandomReal[1, {1000000, 2}];
First, obvious version:
fun1 = Compile[{{vec, _Real, 2}},
Module[{sum = vec[[1]]},
Do[sum += vec[[i]], {i, 2, Length[vec]}];
sum
]
]
How fast is it?
In[13]:= Do[fun1[data], {10}] // Timing
Out[13]= {4.812, Null}
Second, less obvious version:
fun2 = Compile[{{vec, _Real, 1}},
Module[{sum = vec[[1]]},
Do[sum += vec[[i]], {i, 2, Length[vec]}];
sum
]
]
In[18]:= Do[
fun2 /@ Transpose[data],
{10}
] // Timing
Out[18]= {1.078, Null}
As you can see, the second version is much faster. Why? Because the crucial operation, sum += ...
is an addition of numbers in fun2
while it's an addition of arbitrary length vectors in fun1
.
You can see a practical application of the same "optimization" in this asnwer of mine, but many other examples could be given where this is relevant.
Now in this simple example the code using fun2
is not longer or much more complex than fun1
, but in the general case it very well might be.
How can I tell Compile
that one of its arguments is not an arbitrary n*m
matrix, but a special n*2
or n*3
one, so it can do these optimization automatically rather than using a generic vector addition function to add tiny length-2 or length-3 vectors?
Addendum
To make it more clear what's happening, we can use CompilePrint
:
CompilePrint[fun1]
gives
1 argument
5 Integer registers
5 Tensor registers
Underflow checking off
Overflow checking off
Integer overflow checking on
RuntimeAttributes -> {}
T(R2)0 = A1
I1 = 2
I0 = 1
Result = T(R1)3
1 T(R1)3 = Part[ T(R2)0, I0]
2 I3 = Length[ T(R2)0]
3 I4 = I0
4 goto 8
5 T(R1)2 = Part[ T(R2)0, I4]
6 T(R1)4 = T(R1)3 + T(R1)2
7 T(R1)3 = CopyTensor[ T(R1)4]]
8 if[ ++ I4 < I3] goto 5
9 Return
CompilePrint[fun2]
gives
1 argument
5 Integer registers
4 Real registers
1 Tensor register
Underflow checking off
Overflow checking off
Integer overflow checking on
RuntimeAttributes -> {}
T(R1)0 = A1
I1 = 2
I0 = 1
Result = R2
1 R2 = Part[ T(R1)0, I0]
2 I3 = Length[ T(R1)0]
3 I4 = I0
4 goto 8
5 R1 = Part[ T(R1)0, I4]
6 R3 = R2 + R1
7 R2 = R3
8 if[ ++ I4 < I3] goto 5
9 Return
I chose to include this rather than the considerably lengthier C version, where the timing difference is even more pronounced.