91

The two conditions that define a function as pure are as follows:

  1. No side effects (i.e. only changes to local scope are allowed)
  2. Always return the same output, given the same input

If the first condition is always true, are there any times the second condition is not true?

I.e. is it really only necessary with the first condition?

Will Ness
  • 70,110
  • 9
  • 98
  • 181
Magnus
  • 6,791
  • 8
  • 53
  • 84
  • 3
    Your premises are ill-specified. "Input" is too broad. Functions can be thought two have kinds of input. Their arguments, and "environmental"/"contextual". A function that returns the system time could be thought to be pure (even though it's obv not) if you don't distinguish between these two kinds of input. – Alexander Mar 05 '19 at 03:22
  • 4
    @Alexander: In the context of "pure function", "input" is commonly understood to mean the parameters / arguments that are passed explicitly (by whatever mechanism the programming language uses). That's part of the definition of "pure function". But you are right, it's important to mind the definition. – sleske Mar 05 '19 at 08:01
  • 3
    Trivial counter-example: return the value of a global variable. No side effects (the global is only ever read!), but still potentially different results every time. (If you don't like globals, return the address of a local variable which depends on the call stack at run time). – Peter - Reinstate Monica Mar 05 '19 at 15:17
  • 2
    You need to expand your definition of "side effects"; you say that a pure method does not *produce* side effects, but you need to also note that a pure method does not *consume* side effects produced elsewhere. – Eric Lippert Mar 05 '19 at 15:28
  • 2
    @sleske Perhaps commonly understood, but the lack of that distinction is the exact cause of OP's confusion. – Alexander Mar 05 '19 at 15:54
  • 1
    @EricLippert OP is simply quoting e.g. [wikipedia's criteria.](https://en.wikipedia.org/wiki/Pure_function) (I'm tempted to call a function communicating through side channels a "leaky gut" function.) It's precisely criterion 2 which implies no consumption of any (mutable) external state. I would think the two statements are equivalent. – Peter - Reinstate Monica Mar 05 '19 at 16:32

6 Answers6

122

Here are a few counterexamples that do not change the outer scope but are still considered impure:

  • function a() { return Date.now(); }
  • function b() { return window.globalMutableVar; }
  • function c() { return document.getElementById("myInput").value; }
  • function d() { return Math.random(); } (which admittedly does change the PRNG, but is not considered observable)

Accessing non-constant non-local variables is enough to be able to violate the second condition.

I always think of the two conditions for purity as complementary:

  • the result evaluation must not have effects on side state
  • the evaluation result must not be affected by side state

The term side effect only refers to the first, the function modifying the non-local state. However, sometimes read operations are considered as side effects as well: when they are operations and involve writing as well, even if their primary purpose is to access a value. Examples for that are generating a pseudo-random number that modifies the generator's internal state, reading from an input stream that advances the read position, or reading from an external sensor that involves a "take measurement" command.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • 1
    Thanks Bergi. For some reason I thought side effects included *reading* variables outside of local scope, but I guess it is only a side effect if it *writes* such external variables. – Magnus Mar 05 '19 at 06:40
  • Note that the two conditions are actually logically independent: This answer shows that 1. does not imply 2., but there are also many counterexamples that 2. does not imply 1. - for example, a function `setDate(newDate)` that sets the computer's current date to `newDate` has property 2. (the return value can be `newDate` or nothing), but it still violates 1. (it sets the current date). – sleske Mar 05 '19 at 08:08
  • 19
    If `prompt("you choose")` does not have side effects, we should take a step back and clarify the meaning of side effects. – Holger Mar 05 '19 at 08:57
  • @Holger OK, it's a bit of a stretch, probably even more so than `Math.random()`, but given a usual model of JavaScript computation, it does not change the observable environment - it does not violate OPs definition of "*side effects (i.e. only changes to local scope are allowed)*". But you're right, I will change it to something less controversial. – Bergi Mar 05 '19 at 12:10
  • @Bergi Apologies, I forgot a question mark at the end of my above comment. Is it correct that *side effects* **only** includes *writing* to external variables, and not *reading* said variables? – Magnus Mar 05 '19 at 13:05
  • 1
    @Magnus Yes, exactly that's what *effect* means. I'll try to clarify in my answer as well, I didn't expect such large attention and want to make answer worthy of dozens of votes :-) – Bergi Mar 05 '19 at 13:09
  • Or reading from the hard drive, or looking at a website, or reading data from a sensor ... – Acccumulation Mar 05 '19 at 18:40
  • Not sure I agree with the terminology about "side effect". I always thought from CS classes that the deliberately broad term "side effects" was used specifically to *include* cases like in the second bullet. Write to a global variable is explicit, so it is not a *side* effect (but a direct one). Affecting something like a stream's read position seems to perfectly fit the intention of *side effect* to me. It is not the specific intended consequence but occurs nevertheless. – StayOnTarget Mar 05 '19 at 21:19
  • @DaveInCaz I'm sure the term has a pretty specific meaning when discussing referential transparency in functional programming. But there's a more generic usage as well, with the meaning of unintentional or hidden effects, that you seem to have learned about. – Bergi Mar 05 '19 at 21:33
  • 2
    Well for all you know Math.random() returns a thermal diode. It's not actually specified to use a bad RNG. – Joshua Mar 05 '19 at 22:39
  • @Joshua Even that would probably be wrapped in enough OS abstraction on which a read changes some internal state :-) – Bergi Mar 05 '19 at 22:57
  • @Bergi: then consider the purity of a function written in x86 assembly that uses the [`rdrand` machine instruction](https://www.felixcloutier.com/x86/rdrand) to get hardware randomness. The internal implementation goes through some conditioning, but there's no way for x86 instructions to access that internal state. At an architectural level, the instruction produces true randomness every time it executes, in user-space, with no OS intervention. You can't usefully say that you took the value some other function or program would have gotten. But you must not optimize away multiple calls. – Peter Cordes Mar 06 '19 at 09:10
  • @PeterCordes OK, I was thinking about some higher-level language than assembly - I don't know what model to use for purity analysis of that. You're probably right there. (On the other hand, the description "*The Carry Flag indicates whether a random value is available at the time the instruction is executed.*" implies that there is some state involved even in the hardware generator). – Bergi Mar 06 '19 at 16:10
  • @Bergi: Consider the implementation in IvyBridge, where `rdrand` never fails unless the HW detects that the true-RNG is broken. The API leaves the door open for implementations that return an error instead of stalling, in case they can't keep up with the possible number of cores pulling randomness, but IvB can keep up, according to an SO answer from the Intel architect who designed it: [What are the exhaustion characteristics of RDRAND on Ivy Bridge?](//stackoverflow.com/q/14413839). Or consider a hypothetical true RNG (with hidden conditioning, if any). – Peter Cordes Mar 06 '19 at 21:50
  • I guess from a purity perspective, it's equivalent to input on any external sensor measuring any physical quantity. (Different from reading an input *stream* like a keyboard or file, because there's an infinite amount of data available, limited only by how often you sample it.) – Peter Cordes Mar 06 '19 at 21:54
  • and if random generator is not very random then you will have impure function which returns the same output for the same input () – RandomB Mar 08 '19 at 08:26
  • @Bergi I find this thread so puzzling. 1) this question has been asked countless times on SO, 2) it's off-topic, 3) it's brand-new but it has over 70 upvotes. What's so hot about the question this time around? – Mulan Mar 21 '19 at 14:27
  • @user633183 I guess it was posted in the JS tag, which lead to many viewers and a few upvotes, which caused it to be featured on the hot network questions from where it gained even more viewers and votes. Btw, I don't think it is off-topic. And if you can find previous instances, please comment on the question itself - if you found an exact duplicate, I will vote to close. – Bergi Mar 21 '19 at 14:52
  • 1
    Of the two conditions, I've heard the former called "effects" while the latter is called "coeffects". Both are "side effects" and impure. f(coeffects, input) -> effects, output Coeffects are input that comes from the changes in the wider environment , effects are output that change the wider environment. Elm and Clojurescrips re-frame work with this model, for example. –  Jul 26 '19 at 12:07
  • @Dan Thanks for mentioning that term. I don't think coeffects are side effects under most definitions, though. – Bergi Jul 26 '19 at 13:32
31

The "normal" way of phrasing what a pure function is, is in terms of referential transparency. A function is pure if it is referentially transparent.

Referential Transparency, roughly, means that you can replace the call to the function with its return value or vice versa at any point in the program, without changing the meaning of the program.

So, for example, if C's printf were referentially transparent, these two programs should have the same meaning:

printf("Hello");

and

5;

and all of the following programs should have the same meaning:

5 + 5;

printf("Hello") + 5;

printf("Hello") + printf("Hello");

Because printf returns the number of characters written, in this case 5.

It gets even more obvious with void functions. If I have a function void foo, then

foo(bar, baz, quux);

should be the same as

;

I.e. since foo returns nothing, I should be able to replace it with nothing without changing the meaning of the program.

It is clear, then, that neither printf nor foo are referentially transparent, and thus neither of them are pure. In fact, a void function can never be referentially transparent, unless it is a no-op.

I find this definition much easier to handle as the one you gave. It also allows you to apply it at any granularity you want: you can apply it to individual expressions, to functions, to entire programs. It allows you, for example, to talk about a function like this:

func fib(n):
    return memo[n] if memo.has_key?(n)
    return 1 if n <= 1
    return memo[n] = fib(n-1) + fib(n-2)

We can analyze the expressions that make up the function and easily conclude that they are not referentially transparent and thus not pure, since they use a mutable data structure, namely the memo array. However, we can also look at the function and can see that it is referentially transparent and thus pure. This is sometimes called external purity, i.e. a function that appears pure to the outside world, but is implemented impure internally.

Such functions are still useful, because while impurity infects everything around it, the external pure interface builds a kind of "purity barrier", where the impurity only infects the three lines of the function, but does not leak out into the rest of the program. These three lines are much easier to analyze for correctness than the entire program.

Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • 2
    That impurity affects the whole program once you have concurrency. – R.. GitHub STOP HELPING ICE Mar 05 '19 at 16:27
  • @R.. Can you think of a way that concurrency could make the described Fibonacci function externally impure? I can't. Writing to `memo[n]` is idempotent, and failing to read from it merely wastes CPU cycles. – Brilliand Mar 05 '19 at 22:12
  • I agree with both of you. Impurity *can* lead to concurrency problems, but it doesn't in this specific case. – Jörg W Mittag Mar 05 '19 at 22:17
  • @R.. It's not hard to imagine a concurrency-aware version. – user253751 Mar 05 '19 at 23:48
  • 1
    @Brilliand For example, `memo[n] = ...` may first create a dictionary entry, and then store the value into it. This leaves a window during which another thread could see an uninitialized entry. – user253751 Mar 05 '19 at 23:49
  • @immibis And then what? The other thread does the exact same calculation again, and gets the exact same result. It makes no actual difference. No external impurity. As I said before: "failing to read from it merely wastes CPU cycles." – Brilliand Mar 06 '19 at 21:49
  • @Brilliand No, the other thread sees the entry exists and reads the value from it which is some random number. – user253751 Mar 06 '19 at 22:01
  • @immibis Ah, I understand. Yes, that could work, depending on the dictionary library. – Brilliand Mar 06 '19 at 22:22
  • OK, you have function which read current year (from system clock/calendar) and return 1 if it > 0 otherwise 0. Obviously this function has referentially transparency. But it's impure and will live under IO monad. – RandomB Mar 08 '19 at 08:30
  • @Paul-AG: No, this function is not referentially transparent. Referential transparency means that you can replace a call to the function with the result of that call without changing the meaning of the program. But, whichever result you replace the call with, `0` or `1`, you will change the meaning of the program for at least runs, namely, when you replace it with `1` you will change the meaning for runs of the program with the clock set to BC. – Jörg W Mittag Aug 12 '19 at 20:22
13

It seems to me that the second condition you have described is a weaker constraint than the first.

Let me give you an example, suppose you have a function to add one that also logs to the console:

function addOneAndLog(x) {
  console.log(x);
  return x + 1;
}

The second condition you supplied is satisfied: this function always returns the same output when given the same input. It is, however, not a pure function because it includes the side effect of logging to the console.

A pure function is, strictly speaking, a function that satisfies the property of referential transparency. That is the property that we can replace a function application with the value it produces without changing the behaviour of the program.

Suppose we have a function that simply adds:

function addOne(x) {
  return x + 1;
}

We can replace addOne(5) with 6 anywhere in our program and nothing will change.

By contrast, we cannot replace addOneAndLog(x) with the value 6 anywhere in our program without changing behaviour because the first expression results in something being written to the console whereas the second one does not.

We consider any of this extra behaviour that addOneAndLog(x) performs besides returning output as a side-effect.

TheInnerLight
  • 12,034
  • 1
  • 29
  • 52
  • "It seems to me that the second condition you have described is a weaker constraint than the first." No, the two conditions are logically independent. – sleske Mar 05 '19 at 08:03
  • 1
    @sleske you are mistaken. I have provided clear definitions for the terms pure and side-effect. Within these constraints, there is nothing that a function with no side effects besides return the same output for a given input. I have however provided examples where the second condition can be satisfied without the first. The fundamnetal concept to understand the notion of purity is referential transparency. – TheInnerLight Mar 05 '19 at 09:11
  • 1
    Small typo: There is nothing that a function with no side effects *can do* besides returning the same output for a given input. – TheInnerLight Mar 05 '19 at 09:18
  • What about something like returning the current time? That does not have side effects, but it does return a different output for the same input. Or more generally, any function where the result depends not only on the input parameters, but also on a (changeable) global variable. – sleske Mar 05 '19 at 09:24
  • @sleske Returning the current time is absolutely a side effect. You cannot substitute 'Date.now()' with its result anywhere in your program without changing the behaviour, therefore it is not referentially transparent and consequently not pure. – TheInnerLight Mar 05 '19 at 09:29
  • 2
    It seems you are using a different definition of "side effect" than what is commonly used. A side effect is commonly defined as "an observable effect besides returning a value" or an "observable change in state" - see e.g. [Wikipedia](https://en.wikipedia.org/wiki/Side_effect_(computer_science)) , [this post on softwareengineering.SE](https://softwareengineering.stackexchange.com/questions/40297/what-is-a-side-effect). You are totally right that `Date.now()` is not pure/referentially transparent, but not because it has side-effects, but because its result depends on more than just its input. – sleske Mar 05 '19 at 10:05
7

There could be a source of randomness from outside the system. Suppose that part of your calculation includes the room temperature. Then executing the function will yield different results each time depending on the random external element of room temperature. The state is not changed by executing the program.

All I can think of, anyway.

hba23
  • 3
  • 1
user3340459
  • 435
  • 5
  • 13
2

If the first condition is always true, are there any times the second condition is not true?

Yes

Consider simple code snippet below

public int Sum(int a, int b) {
    Random rnd = new Random();
    return rnd.Next(1, 10);
}

This code will return random output for same given set of inputs - however it does not have any side effect.

The overall effect of both the points #1 & #2 you mentioned when combined together means : At any point of time if function Sum with same i/p is replaced with its result in a program, overall meaning of program does not change. This is nothing but Referential transparency.

demo
  • 6,038
  • 19
  • 75
  • 149
rahulaga-msft
  • 3,964
  • 6
  • 26
  • 44
  • But in this case, the first condition is not verified: writing to the console is considered a side effect, since it changes the state of the machine itself. – Right leg Mar 05 '19 at 08:30
  • @Rightleg thx for pointing it out. Somehow I misunderstood OP totally other way. corrected answer. – rahulaga-msft Mar 05 '19 at 08:41
  • 2
    Doesn't it change the state of the random generator? – Eric Duminil Mar 05 '19 at 09:14
  • @EricDuminil is correct. `rnd.Next()` triggers a side effect in the `rnd` object. https://learn.microsoft.com/en-us/dotnet/api/system.random?view=netframework-4.7.2. Therefore the example is not pure – JDurstberger Mar 05 '19 at 09:21
  • 1
    Generating a random number is itself a side effect, unless the state of the random number generator is supplied explicitly which would make the function satisfy condition 2. – TheInnerLight Mar 05 '19 at 09:23
  • 1
    `rnd` doesn't escape the function, so the fact that its state changes doesn't matter to the purity of the function, but the fact that the `Random` constructor uses the current time as a seed value means that there are "inputs" other than `a` and `b`. – Sneftel Mar 05 '19 at 11:14
  • I assumed that we are taking about **observable side effects**. If not, then there is long list of side effects – rahulaga-msft Mar 05 '19 at 21:57
2

Problem with FP definitions is that they are very artificial. Each evaluation/calculation has side-effects on the evaluator. It's theoretically true. A deny of this shows only that FP apologists ignore philosophy and logic: an "evaluation" means changing of the state of some intelligent environment (machine, brain, etc). This is the nature of the evaluation process. No changes - no "calculi". The effect can be very visible: heating the CPU or its failure, shutting down the motherboard in case of overheating, and so on.

When you talk about referential transparency, you should understand that information about such transparency is available for human as a creator of the whole system and holder of semantical information and may be not available for the compiler. For example, a function can read some external resource and it will have IO monad in its signature but it will return the same value all the time (for example, the result of current_year > 0). The compiler does not know that function will return always the same result, so the function is impure but has referentially transparent property and can be substituted with True constant.

So, to avoid such inaccuracy, we should distinguish mathematical functions and "functions" in programming languages. Functions in Haskell are always impure and definition of purity related to them is always very conditionally: they are running on real hardware with real side-effects and physical properties, which is wrong for mathematical functions. This means that the example with "printf" function is totally incorrect.

But not all mathematical functions are pure too: each function which has t (time) as a parameter may be impure: t holds all effects and stochastic nature of the function: in common case you have input signal and have not idea about actual values, it can be even a noise.

Farzad Karimi
  • 770
  • 1
  • 12
  • 31
RandomB
  • 3,367
  • 19
  • 30