4

I am learning about side-effects in functional programming (in Haskell) and I know that external effects are effects that are observable outside the function, whereas internal effects are not visible from the outside.

Is allocating memory for a data structure a pure operation? A side-effect has to modify some state or have an observable interaction with calling functions or the outside world. In the case of allocating a data structure, you would have to call some functions (e.g. malloc) to allocate memory for it. But if these functions were called within a function, this wouldn't be observable to the outside world. Even though the outside world is modified (as there is memory allocated for the data structure), I don't think that allocating a data structure is a side-effect since it's not observable.

However, I am not sure whether my reasoning is correct. Any insights are appreciated.

ceno980
  • 2,003
  • 1
  • 19
  • 37
  • No, that is absolutely not pure, and is definitely observable to the outside world. – AJF Jul 26 '19 at 00:15
  • 5
    Don't mix levels of abstraction. At the level where you are concerned about memory there is no purity and at the level where you are concerned with purity you don't call malloc. – Jared Smith Jul 26 '19 at 10:56
  • Does this answer your question? [What counts as a side-effect? Why isn't memory allocation a side-effect?](https://stackoverflow.com/questions/40894498/what-counts-as-a-side-effect-why-isnt-memory-allocation-a-side-effect) – Sebastian Nov 18 '22 at 20:49
  • Sorry for my duplicate suggestion above. I must admit I like @DarthFennec 's answer here better than the answers in the linked previous question. – Sebastian Nov 18 '22 at 22:17

2 Answers2

7

Is allocating memory for a data structure a pure operation?

In Haskell, memory is almost never allocated directly by the programmer. In this sense, the question is moot: allocating memory is neither a pure nor impure operation, because allocating memory is not an operation, but rather, it is an implementation detail. Put another way, memory allocations in Haskell are not observable to external code (or any code), but that's not because of purity, it's because the language itself abstracts over the concept of memory allocation. As far as the Haskell code itself is concerned, there's no such thing as memory, or memory allocation.

The reason this is important is because it allows the compiler to make all kinds of optimizations without changing the meaning of the code. For example, in Haskell, when you make small changes to a large data structure, you're actually making a copy of the structure instead of modifying the original, and this is extremely inefficient. However, the compiler can usually tell if old copies of the structure are needed, and if not, it will write machine code that simply modifies the original structure. As another example, small local values may be moved to CPU registers or the system stack, bypassing allocation entirely for those values. As long as it doesn't change what the code does, the optimizer can and will break the purity rule by any means necessary. The distinction is simply unimportant at that level.


In the case of allocating a data structure, you would have to call some functions (e.g. malloc) to allocate memory for it. But if these functions were called within a function, this wouldn't be observable to the outside world. Even though the outside world is modified (as there is memory allocated for the data structure), I don't think that allocating a data structure is a side-effect since it's not observable.

There are situations where a memory allocation operation may be exposed to Haskell code. For example, a C function that allocates memory with malloc might be called by a Haskell binding via the FFI. In these situations, the author of the binding needs to decide whether the function is "pure" (the type should return a pure value), or "impure" (the type should return an IO action). This is the main situation I can think of where an answer to this question is pragmatically valuable.

In this context, the important things to look at are:

  • If I run the function with the same inputs in any situation, will it always give me the same outputs?
  • If I replace the function call in the source code with its resulting value, will the program have exactly the same behavior?

If both of the answers are "yes", it's a pure function, otherwise it's impure. This is regardless of how much impurity is happening in the C code, as long as that impurity is not visible outside of the function it's fine.


So to actually answer your question: It depends.

Say malloc is called, and then the memory is freed before the function exits. Over the course of the function, zero net memory has been allocated. So the function is pure.

Say malloc is called, and a pointer into the allocated memory is returned. This is not pure, because Haskell is only aware of the pointer, not the allocated memory itself. If we run this function to allocate a 4-byte chunk, and then we run it again to allocate another 4-byte chunk, and Haskell thinks the function is pure, it may replace the second call with the result (pointer) of the first call, causing both calls to return pointers to the same 4-byte chunk (which is not what you want). Because of this, the function must be typed as an impure IO action.

DarthFennec
  • 2,650
  • 17
  • 24
  • In the case of the pure function which uses malloc, what if malloc fails? Assuming the underlying C function handles malloc failures consistently, then for a given input you could get different results. Is the state of the memory then considered to be part of the system state when the function is called? So (1) if memory is available, then for input x output is y, or (2) if no memory available, then for input x, output is z. i.e. the state of the memory effectively forms part of the input to the function – Duncan Drennan Nov 05 '20 at 23:24
  • 1
    @DuncanDrennan Failures like this tend to be represented in Haskell as "bottom" values (Haskell's version of exceptions). Those can be caught and handled on the IO level, but in pure code they propagate into any value that depends on them. So if such a failure happens at any point, the resulting output value (and any value that uses it) won't exist or be accessible. So as far as the pure code is concerned, the result is still pure. The only part of the code that's aware of the exception is at the IO level, and only if it's explicitly caught and handled. – DarthFennec Nov 06 '20 at 21:30
2

Making a binding to be s value is considered pure, but the resulting machine code will overwrite memory to do so. Thus the question is if you are explicitly allocating or if you are setting a value leaving the compiler to do the allocating.

Sylwester
  • 47,942
  • 4
  • 47
  • 79