Why is a built-in function applied to too few arguments considered to be in weak head normal form?

Question

The Haskell definition says:

An expression is in weak head normal form (WHNF), if it is either:

a constructor (eventually applied to arguments) like True, Just (square 42) or (:) 1

a built-in function applied to too few arguments (perhaps none) like (+) 2 or sqrt.

or a lambda abstraction \x -> expression.

Why do built-in functions receive special treatment? According to lambda calculus, there is no difference between a partially applied function and any other function, because at the end we have only one argument functions.

I took the liberty to edit that Haskell wiki page with an explanation of why this distinction between built-ins and lambdas doesn't matter to semantics. (And also a note about a subtlety with strict constructor fields.) — Ørjan Johansen, Jun 28 '14 at 01:37
BTW that wiki page is not actually part of [Haskell's official definition](https://www.haskell.org/onlinereport/haskell2010/). However the official report doesn't even mention the issue, seemingly preferring to assume readers know denotational semantics... — Ørjan Johansen, Jun 28 '14 at 01:40

Ben · Accepted Answer · 2014-06-27T09:09:31.820

22

A normal function applied to an argument, like the following:

(\x y -> x + 1 : y) 1

Can be reduced, to give:

\y -> 1 + 1 : y

In the first expression, the "outermost" thing was an application, so it was not in WHNF. In the second, the outermost thing is a lambda abstraction, so it is in WHNF (even though we could do more reductions inside the function body).

Now lets consider the application of a built-in (primitive) function:

(+) 1

Because this is a built-in, there's no function body in which we can substitute 1 for the first parameter. The evaluator "just knows" how to evaluate fully "saturated" applications of (+), like (+) 1 2. But there's nothing that can be done with a partially applied built-in; all we can do is produce a data structure describing "apply (+) to 1 and wait for one more argument", but that's exactly what the thing we're trying to reduce is. So we do nothing.

Built-ins are special because they're not defined by lambda calculus expressions, so the reduction process can't "see inside" their definition. Thus, unlike normal functions applications, built-in function applications have to be "reduced" by just accumulating arguments until they are fully "saturated" (in which case reduction to WHNF is by running whatever the magic implementation of the built-in is). Unsaturated built-in applications cannot be reduced any further, and so are already in WHNF.

edited Jun 27 '14 at 09:09

answered Jun 27 '14 at 09:02

Ben

68,572
20
126
174

1

Does the runtime system *actually* reduce partially applied user-defined functions? Certainly we expect `(\x->if x==1 then function1 else function2) k` to be reduced, but it seems to me that a peculiar representation would be needed to actually reduce `(\x y -> x + x*y + x^y) k`. – dfeuer Jun 27 '14 at 18:54
@dfeuer Since GHC at runtime doesn't represent functions as lambda terms, that would be very hard indeed. GHC normally only applies a function once it has all its arguments. It has a [special representation](https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/HeapObjects#Partialapplications) of partially applied functions for this. – Ørjan Johansen Jun 28 '14 at 00:46
1

@dfeuer Although you certainly *can* write a runtime system which would reduce partially applied functions like this. I implemented a toy lazy functional language by straightforward graph reduction that did this; it's easier because you then don't have to know or care when a function has all its arguments; partial application is the same as final application (except for built-ins). The trick is that you don't use textual terms, but graphs. So although we would *write* the reduction of your example as `\y -> k + k*y + k^y`, in the actual system each k is actually a reference to a single node. – Ben Jun 28 '14 at 02:10
Ben, my real point is that *any* sort of reduction of a partially applied function (i.e., doing nothing but substituting in a value for an argument) can, at best, perform a small and constant amount of work, and can never produce bottom. As a result, the only reason I personally can see not to consider such an application to be in WHNF is to simplify the description of what WHNF means. – dfeuer Jun 28 '14 at 02:24
As for constant factors, I wonder if there's anything interesting to do with partial application of built-ins. For example, would it be possible to reduce `(+ 5)` by producing machine code that uses an *immediate* instruction rather than having to load in the `5`? – dfeuer Jun 28 '14 at 02:30
@dfeuer Your point is close to what I tried to imply with my edit of that Haskell wiki page. Also, that immediate instruction idea sounds entirely possible and might for all I know already be done by GHC. (Via that new arity analysis and LLVM optimizations, maybe?) – Ørjan Johansen Jun 28 '14 at 03:56
@ØrjanJohansen, I imagine such things are done when the argument is statically known. When it's not, things get much messier. I imagine that there could be some issues convincing some operating systems to even allow such a thing, and then there's the question of when it's worth bothering. – dfeuer Jun 28 '14 at 04:17
1

@dfeuer Oh right, when it's not that would be JIT compiling. Could still be possible though. I think GHC actually has a small amount of runtime code generation whenever you use the FFI to export closures (because C function pointers for them cannot be constructed on the fly without it), but nothing like that. – Ørjan Johansen Jun 28 '14 at 05:09
@dfeur When you have only single-argument functions, there's no such thing as "partial application"; there's just application. And functions are first class; when you see an application that's supposed to result in a function, you don't inherently know whether it's a "partial application" of a function, or a "full" application of a function which happens to have a function result type; the later could easily result in bottom or do a more interesting amount of work (which might be shared). If all you've got is lambda calculus structures, you have to evaluate it to WHNF to find out. – Ben Jun 29 '14 at 00:34

score 3 · Answer 2 · answered Jun 27 '14 at 08:56

3

Consider

Prelude> let f n = [(+x) | x <- [1..]] !! n
Prelude> let g = f 20000000 :: Int -> Int

g is at this point not in WHNF! You can see this by evaluating, say, g 3, which takes a noticable lag because you need WHNF before you can apply an argument. That's when the list is traversed in search for the right built-in function. But afterwards, this choice is then fixed, g is WHNF (and indeed NF: that's the same for lambdas, perhaps what you meant with your question) and thus any subsequent calls will give a result immediately.

answered Jun 27 '14 at 08:56

leftaroundabout

117,950
5
174
319

why after first run `g 3` a `f 20000000` is not memorized? I mean why two subsequent calls for `g 3` takes the same, noticeable time? – Robert Zaremba Jun 27 '14 at 10:34
Well, they don't. It _is_ memoised, at least in all `ghci`s I have installed (7.4, 7.6, 7.8). Since `MonomorphismRestriction` is off by default, this requires the explicit monomorphic signature, maybe you forgot that? (A polymorphic function has one hidden extra argument.) – leftaroundabout Jun 27 '14 at 11:11
*Since GHC 7.8.1, the monomorphism restriction is switched off by default in GHCi.* Is there any way to enable it? What do you mean about extra argument for a polymorphic function? – Robert Zaremba Jun 29 '14 at 22:02
1

@RobertZaremba: you can switch it on by setting the `-XMonomorphismRestriction` flag... but don't, it's evil; you can always achieve the same goal with explicit signatures (like `Int -> Int` in my example); this is much more transparent. — Haskell polymorphism is _parametric polymorphism_, which means if a function's signature has some type variable in it (here we have `f :: (Num a, Enum a) => Int -> a->a`) then the properties of this type (i.e. how it can be used as a number) need to be passed to the function as a "dictionary argument". That's why memoisation doesn't always work as expected. – leftaroundabout Jun 29 '14 at 23:41

Why is a built-in function applied to too few arguments considered to be in weak head normal form?

2 Answers2

Linked

Related