82

Today i was reading about pure function, got confused with its use:

A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.

e.g. strlen() is a pure function while rand() is an impure one.

__attribute__ ((pure)) int fun(int i)
{
    return i*i;
}

int main()
{
    int i=10;
    printf("%d",fun(i));//outputs 100
    return 0;
}

http://ideone.com/33XJU

The above program behaves in the same way as in the absence of pure declaration.

What are the benefits of declaring a function as pure[if there is no change in output]?

Richard JP Le Guen
  • 28,364
  • 7
  • 89
  • 119
Green goblin
  • 9,898
  • 13
  • 71
  • 100
  • There are more optimisations possible for pure functions. If the compiler can figure out the purity by itself, the pragma makes no difference. If the compiler can't figure it out, telling it that the function is pure may cause it to be better optimised. If you lie about the purity, bad things will happen, probably. – Daniel Fischer Jun 22 '12 at 09:45
  • Is it possible to see the optimizations made by compiler? – Green goblin Jun 22 '12 at 09:47
  • 7
    Yes - look at the generated assembly. – Philip Kendall Jun 22 '12 at 09:47
  • @PhilipKendall, in this particular case, there is no difference in assembly whether `__attribute__((pure))` is present or not. (gcc 4.4.5, with or without -O3). I suspect that gcc can figure out himself that `fun()` is pure. (Note that I also tried with your example). – Ben Jun 22 '12 at 10:06
  • Additional info for `function attributes` can be found here: http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html – slashmais Jun 22 '12 at 10:09
  • 4
    I don't think this definition of purity is correct - `printf`, for example, would qualify (calling it twice with the same arguments yields the same return value), but it is not pure. – tdammers Jun 22 '12 at 11:51
  • 14
    @tdammers: Indeed, it lacks the `...and no side-effects...` part. – Frerich Raabe Jun 22 '12 at 12:15
  • @FrerichRaabe: I thought the "no side-effects" part implied "same output for same input"... if you can't have side effects, what are you going to base the output on, if not the inputs? – tdammers Jun 22 '12 at 13:10
  • @tdammers: In theory, a true random function has no side effect, yet it doesn't always output the same result. – Ben Jun 22 '12 at 13:20
  • 2
    @Ben: where does the entropy come from? We're dealing with (theoretically) deterministic machines here, the only way of getting true entropy into them is from external sources, which means side effects. Of course, we could allow programming languages to define nondeterministic functions, pretending the technical side effects aren't there and the functions really are nondeterministic; but if we do that, most of the practical benefits of tracking purity are lost. – tdammers Jun 22 '12 at 13:50
  • 1
    @tdammers: You wrote that `printf` gives the same output for the same input, yet it is not pure. I agree with this - hence I wrote that the definition of `purity` given in the question should mention that a function must not have side-effects. With that addition, `printf` would no longer be pure. – Frerich Raabe Jun 22 '12 at 14:03
  • 3
    tdammers is correct - the definition of *pure* given above is incorrect. Pure means that the output depends **only** on the inputs to the function; also, there must be no observable side-effects. "Same output for same input" is a very inaccurate summary of those requirements. http://en.wikipedia.org/wiki/Pure_function – Dancrumb Jun 22 '12 at 14:04
  • @FrerichRaabe: Absolutely correct. I'm just suggesting that the 'no side effects' rule supersedes the 'same output for same inputs' rule, making it redundant. – tdammers Jun 22 '12 at 14:09
  • 1
    `strlen()` isn't a pure function. I can modify the perceived length of a string at runtime, yet pass the same pointer. – Christian Mann Jun 22 '12 at 14:56
  • 2
    @ChristianMann: strlen() is a pure function. Modifying the string at runtime is changing the input, therefore it does not need to have the same output. Also, I fixed the definition of pure in the question, but it needs review. – Robert Mason Jun 22 '12 at 16:26
  • @RobertMason But the only input that strlen gets is a pointer, not the string itself. It can't be memoized, for instance. – Christian Mann Jun 22 '12 at 16:48
  • @ChristianMann: True. Though if the pointer is const and qualified with restrict (I know it's not standard, but many compilers support it) then it can be memoized. And I guess that we could say that an overload of strlen() for std::string could be considered pure. – Robert Mason Jun 22 '12 at 16:55
  • 2
    @tdammers - Isn't the printing itself the side effect? – detly Jun 22 '12 at 17:33
  • @Christian In fact, GCC *does* memoise `strlen`’s result, for instance when used in a loop (`for (i = 0; i < strlen(str); ++i)`) but not because it’s pure (it isn’t – you’re right), but because it’s a builtin that gets special treatment from the compiler (`-fno-builtin` disables this). I’m assuming that the compiler does this only for string literals where it can be sure that the string isn’t modified (that’s UB), otherwise it’d have to trace aliases to the memory location to prove that it’s not modified through any pointer. – Konrad Rudolph Jun 23 '12 at 17:22
  • 1
    @tdammers: Well, printf("Hello world\n"); has the result of displaying x + "Hello world\n", where x was the previous content of the terminal, and + is string concatenation. So the result value does not depend on the input alone but also on the context in which printf is executed. In this sense printf is not pure. – Giorgio Jun 27 '12 at 06:03

6 Answers6

146

pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like

for (int i = 0; i < 1000; i++)
{
    printf("%d", fun(10));
}

With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.

Philip Kendall
  • 4,304
  • 1
  • 23
  • 42
  • Ie, you can safely use memoization – Joel Coehoorn Jun 22 '12 at 17:48
  • @mob What do you mean? Why not? – Konrad Rudolph Jun 22 '12 at 18:17
  • 15
    Because you can modify the string (sequence of chars starting from some address) without modifying the input (the pointer to the address where the string starts), i.e., you can't memoize it. It would only be a pure function in a language with immutable strings (Java, say). – mob Jun 22 '12 at 19:35
  • 5
    @KonradRudolph: Imagine a length 1000 string. Call `strlen` on it. Then again. Same thing yes? Now modify the second character to be `\0`. Does `strlen` still return 1000 now? The starting address is the same (== input is same) but the function now returns a different value. – Mike Bailey Jun 23 '12 at 16:10
  • 5
    @mob That’s a good objection, obviously you are right. I was misled by the fact that [even books](http://books.google.co.uk/books?id=k_ocKY0iegsC&lpg=PA341&dq=pure%20strlen&pg=PA341#v=onepage&q=pure%20strlen&f=false) claim that `strlen` (in GCC / glibc) is in fact pure. But a look at the glibc implementation showed this to be wrong. – Konrad Rudolph Jun 23 '12 at 16:44
34

When you say a function is 'pure' you are guaranteeing that it has no externally visible side-effects (and as a comment says, if you lie, bad things can happen). Knowing that a function is 'pure' has benefits for the compiler, which can use this knowledge to do certain optimizations.

Here is what the GCC documentation says about the pure attribute:

pure

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure. For example,

          int square (int) __attribute__ ((pure));

Philip's answer already shows how knowing a function is 'pure' can help with loop optimizations.

Here is one for common sub-expression elimination (given foo is pure):

a = foo (99) * x + y;
b = foo (99) * x + z;

Can become:

_tmp = foo (99) * x;
a = _tmp + y;
b = _tmp + z;
Community
  • 1
  • 1
ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
  • 3
    I am not sure if any do this, but pure functions also allow the compiler to reorder when the function gets called, should it deem reordering would be beneficial. When there is the possibility for side effects, the compiler needs to be more conservative. – mpdonadio Jun 22 '12 at 17:12
  • @MPD - Yes, that sounds reasonable. And since a `call` instruction is a bottleneck for superscalar CPUs, some help from the compiler can help. – ArjunShankar Jun 22 '12 at 17:50
  • I vaguely recall using a DSP compiler a number of years ago that would use this technique to get return values sooner/later. This allowed it to minimize pipeline stalls. – mpdonadio Jun 22 '12 at 18:53
  • 1
    Could "foo(99)" be precalculated since 99 is a const and foo will always return the same result? Maybe in some sort of two-stage compile? – markwatson Jun 25 '12 at 20:41
  • 1
    @markwatson - I am not sure. There can be cases when it is simply not possible. e.g. if `foo` is part of another compilation unit (another C file), or in a pre-compiled library. In both cases, the compiler wont know what `foo` does, and cannot pre-calculate. – ArjunShankar Jun 25 '12 at 23:08
29

In addition to possible run-time benefits, a pure function is much easier to reason about when reading code. Furthermore, it's much easier to test a pure function since you know that the return value only depends on the values of the parameters.

Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
15

A non-pure function

int foo(int x, int y) // possible side-effects

is like an extension of a pure function

int bar(int x, int y) // guaranteed no side-effects

in which you have, besides the explicit function arguments x, y, the rest of the universe (or anything your computer can communicate with) as an implicit potential input. Likewise, besides the explicit integer return value, anything your computer can write to is implicitly part of the return value.

It should be clear why it is much easier to reason about a pure function than a non-pure one.

Giorgio
  • 5,023
  • 6
  • 41
  • 71
7

Just as an add-on, I would like to mention that C++11 codifies things somewhat using the constexpr keyword. Example:

#include <iostream>
#include <cstring>

constexpr unsigned static_strlen(const char * str, unsigned offset = 0) {
        return (*str == '\0') ? offset : static_strlen(str + 1, offset + 1);
}

constexpr const char * str = "asdfjkl;";

constexpr unsigned len = static_strlen(str); //MUST be evaluated at compile time
//so, for example, this: int arr[len]; is legal, as len is a constant.

int main() {
    std::cout << len << std::endl << std::strlen(str) << std::endl;
    return 0;
}

The restrictions on the usage of constexpr make it so that the function is provably pure. This way, the compiler can more aggressively optimize (just make sure you use tail recursion, please!) and evaluate the function at compile time instead of run time.

So, to answer your question, is that if you're using C++ (I know you said C, but they are related), writing a pure function in the correct style allows the compiler to do all sorts of cool things with the function :-)

Robert Mason
  • 3,949
  • 3
  • 31
  • 44
4

In general, Pure functions has 3 advantages over impure functions that the compiler can take advantage of:

Caching

Lets say that you have pure function f that is being called 100000 times, since it is deterministic and depends only on its parameters, the compiler can calculate its value once and use it when necessary

Parallelism

Pure functions don't read or write to any shared memory, and therefore can run in separate threads without any unexpected consequence

Passing By Reference

A function f(struct t) gets its argument t by value, and on the other hand, the compiler can pass t by reference to f if it is declared as pure while guaranteeing that the value of t will not change and have performance gains


In addition to the compile time considerations, pure functions can be tested fairly easy: just call them.

No need to construct objects or mock connections to DBs / file system.

Uri Goren
  • 13,386
  • 6
  • 58
  • 110