21

From Real World Haskell I read

It operates as follows: when a seq expression is evaluated, it forces its first argument to be evaluated, then returns its second argument. It doesn't actually do anything with the first argument: seq exists solely as a way to force that value to be evaluated.

where I've emphasised the then because to me it implies an order in which the two things happen.

From Hackage I read

The value of seq a b is bottom if a is bottom, and otherwise equal to b. In other words, it evaluates the first argument a to weak head normal form (WHNF). seq is usually introduced to improve performance by avoiding unneeded laziness.

A note on evaluation order: the expression seq a b does not guarantee that a will be evaluated before b. The only guarantee given by seq is that the both a and b will be evaluated before seq returns a value. In particular, this means that b may be evaluated before a. […]

Furthermore, if I click on the # Source link from there, the page doesn't exist, so I can't see the code of seq.

That seems in line with a comment under this answer:

[…] seq cannot be defined in normal Haskell

On the other hand (or on the same hand, really), another comment reads

The 'real' seq is defined in GHC.Prim as seq :: a -> b -> b; seq = let x = x in x. This is only a dummy definition. Basically seq is specially syntax handled particularly by the compiler.

Can anybody shed some light on this topic? Especially in the following respects.

  • What source is right?
  • Is seq's implementation really not writable in Haskell?
    • If so, what does it even mean? That it is a primitive? What does this tell me about what seq actually does?
  • In seq a b is a guaranteed to be evaluated before b at least in the case that b makes use of a, e.g. seq a (a + x)?
Enlico
  • 23,259
  • 6
  • 48
  • 102

3 Answers3

29

Others answers have already discussed the meaning of seq and its relationship to pseq. But there appears to be quite some confusion about what exactly the implications of seq’s caveats are.

It is true, technically speaking, that a `seq` b does not guarantee a will be evaluated before b. This may seem troubling: how could it possibly serve its purpose if that were the case? Let’s consider the example Jon gave in their answer:

foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f acc [] = acc
foldl' f acc (x : xs)
  = acc' `seq` foldl' f acc' xs
  where
    acc' = f acc x

Surely, we care about acc' being evaluated before the recursive call here. If it is not, the whole purpose of foldl' is lost! So why not use pseq here? And is seq really all that useful?

Fortunately, the situation is not actually so dire. seq really is the right choice here. GHC would never actually choose to compile foldl' such that it evaluates the recursive call before evaluating acc', so the behavior we want is preserved. The difference between seq and pseq is rather what flexibility the optimizer has to make a different decision when it thinks it has particularly good reason to.

Understanding seq and pseq’s strictness

To understand what that means, we must learn to think a little like the GHC optimizer. In practice, the only concrete difference between seq and pseq is how they affect the strictness analyzer:

  1. seq is considered strict in both of its arguments. That is, in a function definition like

    f a b c = (a `seq` b) + c
    

    f will be considered strict in all three of its arguments.

  2. pseq is just like seq, but it’s only considered strict in its first argument, not its second one. That means in a function definition like

    g a b c = (a `pseq` b) + c
    

    g will be considered strict in a and c, but not b.

What does this mean? Well, let’s first define what it means for a function to “be strict in one of its arguments” in the first place. The idea is that if a function is strict in one of its arguments, then a call to that function is guaranteed to evaluate that argument. This has several implications:

  • Suppose we have a function foo :: Int -> Int that is strict in its argument, and suppose we have a call to foo that looks like this:

    foo (x + y)
    

    A naïve Haskell compiler would construct a thunk for the expression x + y and pass the resulting thunk to foo. But we know that evaluating foo will necessarily force that thunk, so we’re not gaining anything from this laziness. It would be better to evaluate x + y immediately, then pass the result to foo to save a needless thunk allocation.

  • Since we know there’s never any reason to pass a thunk to foo, we gain an opportunity to make additional optimizations. For example, the optimizer could choose to internally rewrite foo to take an unboxed Int# instead of an Int, avoiding not only a thunk construction for x + y but avoiding boxing the resulting value altogether. This allows the result of x + y to be passed directly, on the stack, rather than on the heap.

As you can see, strictness analysis is rather crucial to making an efficient Haskell compiler, as it allows the compiler to make far more intelligent decisions about how to compile function calls, among other things. For that reason, we generally want strictness analysis to find as many opportunities to eagerly evaluate things as possible, letting us save on useless heap allocations.

With this in mind, let’s return to our f and g examples above. Let’s think about what strictness we’d intuitively expect these functions to have:

  1. Recall that the body of f is (a `seq` b) + c. Even if we ignore the special properties of seq altogether, we know that it eventually evaluates to its second argument. This means f ought to be at least as strict as if its body were just b + c (with a entirely unused).

    We know that evaluating b + c must fundamentally evaluate both b and c, so f must, at the very least, be strict in both b and c. Whether it’s strict in a is the more interesting question. If seq were actually just flip const, it would not be, as a would not be used, but of course the whole point of seq is to introduce artificial strictness, so in fact f is also considered strict in a.

    Happily, the strictness of f I mentioned above is entirely consistent with our intuition about what strictness it ought to have. f is strict in all of its arguments, precisely as we would expect.

  2. Intuitively, all of the above reasoning for f should also apply to g. The only difference is the replacement of seq with pseq, and we know that pseq provides a stronger guarantee about evaluation order than seq does, so we’d expect g to be at least as strict as f… which is to say, also strict in all its arguments.

    However, remarkably, this is not the strictness GHC infers for g. GHC considers g strict in a and c, but not b, even though by our definition of strictness above, g is rather obviously strict in b: b must be evaluated for g to produce a result! As we’ll see, it is precisely this discrepancy that makes pseq so deeply magical, and why it’s generally a bad idea.

The implications of strictness

We’ve now seen that seq leads to the strictness we’d expect while pseq does not, but it’s not immediately obvious what that implies. To illustrate, consider a possible call site where f is used:

f a (b + 1) c

We know that f is strict in all its arguments, so by the same reasoning we used above, GHC should evaluate b + 1 eagerly and pass its result to f, avoiding a thunk.

At first blush, this might seem all well and good, but wait: what if a is a thunk? Even though f is also strict in a, it’s just a bare variable—maybe it was passed in as an argument from somewhere else—and there’s no reason for GHC to eagerly force a here if f is going to force it itself. The only reason we force b + 1 is to spare a new thunk from being created, but we save nothing but forcing the already-created a at the call site. This means a might in fact be passed as an unevaluated thunk.

This is something of a problem, because in the body of f, we wrote a `seq` b, requesting a be evaluated before b. But by our reasoning above, GHC just went ahead and evaluated b first! If we really, really need to make sure b isn’t evaluated until after a is, this type of eager evaluation can’t be allowed.

Of course, this is precisely why pseq is considered lazy in its second argument, even though it actually is not. If we replace f with g, then GHC would obediently allocate a fresh thunk for b + 1 and pass it on the heap, ensuring it is not evaluated a moment too soon. This of course means more heap allocation, no unboxing, and (worst of all) no propagation of strictness information further up the call chain, creating potentially cascading pessimizations. But hey, that’s what we asked for: avoid evaluating b too early at all costs!

Hopefully, this illustrates why pseq is seductive, but ultimately counterproductive unless you really know what you’re doing. Sure, you guarantee the evaluation you’re looking for… but at what cost?

The takeaways

Hopefully the above explanation has made clear how both seq and pseq have advantages and disadvantages:

  • seq plays nice with the strictness analyzer, exposing many more potential optimizations, but those optimizations might disrupt the order of evaluation we expect.

  • pseq preserves the desired evaluation order at all costs, but it only does this by outright lying to the strictness analyzer so it’ll stay off its case, dramatically weakening its ability to help the optimizer do good things.

How do we know which tradeoffs to choose? While we may now understand why seq can sometimes fail to evaluate its first argument before its second, we don’t have any more reason to believe this is an okay thing to let happen.

To soothe your fears, let’s take a step back and think about what’s really happening here. Note that GHC never actually compiles the a `seq` b expression itself in such a way that a is failed to be evaluated before b. Given an expression like a `seq` (b + c), GHC won’t ever secretly stab you in the back and evaluate b + c before evaluating a. Rather, what it does is much more subtle: it might indirectly cause b and c to be individually evaluated before evaluating the overall b + c expression, since the strictness analyzer will note that the overall expression is still strict in both b and c.

How all this fits together is incredibly tricky, and it might make your head spin, so perhaps you don’t find the previous paragraph all that soothing after all. But to make the point more concrete, let’s return to the foldl' example at the beginning of this answer. Recall that it contains an expression like this:

acc' `seq` foldl' f acc' xs

In order to avoid the thunk blowup, we need acc' to be evaluated before the recursive call to foldl'. But given the above reasoning, it still always will be! The difference that seq makes here relative to pseq is, again, only relevant for strictness analysis: it allows GHC to infer that this expression is also strict in f and xs, not just acc', which in this situation doesn’t actually change much at all:

  • The overall foldl' function still isn’t considered strict in f, since in the first case of the function (the one where xs is []), f is unused, so for some call patterns, foldl' is lazy in f.

  • foldl' can be considered strict in xs, but that is totally uninteresting here, since xs is only a piece of one of foldl'’s arguments, and that strictness information doesn’t affect the strictness of foldl' at all.

So, if there is not actually any difference here, why not use pseq? Well, suppose foldl' is inlined some finite number of times at a call site, since maybe the shape of its second argument is partially known. The strictness information exposed by seq might then expose several additional optimizations at the call site, leading to a chain of advantageous optimizations. If pseq had been used, those optimizations would be obscured, and GHC would produce worse code.

The real takeaway here is therefore that even though seq might sometimes not evaluate its first argument before its second, this is only technically true, the way it happens is subtle, and it’s pretty unlikely to break your program. This should not be too surprising: seq is the tool the authors of GHC expect programmers to use in this situation, so it would be rather rude of them to make it do the wrong thing! seq is the idiomatic tool for this job, not pseq, so use seq.

When do you use pseq, then? Only when you really, really care about a very specific evaluation order, which usually only happens for one of two reasons: you are using par-based parallelism, or you’re using unsafePerformIO and care about the order of side effects. If you’re not doing either of these things, then don’t use pseq. If all you care about is use cases like foldl', where you just want to avoid needless thunk build-up, use seq. That’s what it’s for.

Alexis King
  • 43,109
  • 15
  • 131
  • 205
  • 1
    Maybe I'm dumb, but I haven't understood the whole story yet. For instance, you write _GHC would never actually choose to [...]_; so is it all about putting my trust in GHC? In my opinion that means that, as regards what `seq` does, I can't trust a compiler "just because it's a zero-bugs 100% standard-compliant compiler" (if one exists): for me to trust it, it must guarantee more that that, e.g. that it is "smart enough". Is this even a measurable thing? In other words, if tomorrow _GHC will chose to [...]_, would it be less standard-compliant than it is today? – Enlico Apr 07 '21 at 07:26
  • Another point that doesn't help me is _what flexibility the optimizer has to bend the rules_. "Bending the rules"? Aren't we in determinism-land? – Enlico Apr 07 '21 at 07:30
  • 2
    @Enlico You are absolutely correct that *the Haskell Report alone* does not guarantee a useful implementation of `seq`, so if you are looking for something spelled out in a standard that mandates the behavior we all rely on `seq` having, you will never find it. Sorry. But GHC is the de facto standard these days and has been for quite some time, and though it may not mention this guarantee explicitly anywhere, I can assure you, it does provide it. I am 100% certain that if `seq` stopped working for this, it would be considered a dire bug and promptly fixed. – Alexis King Apr 07 '21 at 07:32
  • 1
    @Enlico As for “determinism land,” I’m not sure what you mean. The Haskell Report takes great pains to specify almost nothing about evaluation order, so compilers are free to make optimizations as they see fit. This does not compromise on semantic determinism or purity because evaluation order is not observable from within a Haskell program without using `unsafePerformIO` (which is, in fact, much of why it’s unsafe). `seq` is highly unusual in that it is supposed to affect evaluation order, but this is utterly unique, which is why it must be a magical primitive. – Alexis King Apr 07 '21 at 07:36
  • To your first comment: ok, this (I hope!) gives me a better perspective on the whole topic. Would I be correct in saying that with `= foldl' f acc' xs` GHC could/would decide to never evaluate `acc'` (well, not before reaching an empty `xs` and coming back), whereas in `= acc' `seq` foldl' f acc' xs`, it will have to, and so it will have a chance to say _oh, since I have to evaluate `acc'` as the LHS of `seq` and it is used in the RHS of `seq`, I will evaluate the LHS first, so I can reuse it in the RHS_? – Enlico Apr 07 '21 at 08:42
  • To your second comment: I meant that my understanding of rules is that they are a fact, they can't be bent or eluded, so what does "bend the rules" even mean? I understand that the standard leaves room (degrees of freedom, I'd say) for different implementations of the same thing, but doesn't that mean that two different implementation honor the rules in different ways? – Enlico Apr 07 '21 at 08:46
  • @Enlico (meta: you can enclose the code in comments in double backticks for proper highlighting, like \`\`this\`op\`that\`\`, which is then rendered ``this`op`that``). – Will Ness Apr 07 '21 at 13:41
  • 1
    @Enlico re `foldl'`, yes. `where { acc' = f acc x }` _defines_ `acc'`. nothing in how `acc'` is used in the hypothetical ``foldl_ f acc (x : xs) = foldl_ f acc' xs where { acc' = f acc x }`` "forces" it though. so it's a thunk, and stays a thunk. i.e. ``foldl_ f acc [a,b,c] === let {x = f acc a; y = f x b; z = f y c} in z`` -- even _after_ reaching that end-of-list. Now imagine printing that `z` -- its definition refers to the _definition_ of `y`, which refers to the _definition_ of `x`, etc.. three thunks to force, instead of one value to immediately get. :) (or have I misunderstood your qn?) – Will Ness Apr 07 '21 at 13:56
  • @Enlico I agree with you that my use of the phrase “bend the rules” was needlessly confusing—there isn’t really any “rule” being bent here—so I’ve edited my answer to remove it. The intended meaning was just that ``a `seq` b`` gives the compiler the flexibility to sometimes evaluate (parts of) `b` before `a` so that it is allowed to make other optimizations that are particularly helpful, but don’t actually compromise on the overall thrust of using `seq`. – Alexis King Apr 07 '21 at 22:46
  • 1
    @Enlico More generally, I think that’s something worth reiterating: the Haskell Report doesn’t say much about `seq` because its behavior is *remarkably* tricky to pin down without overspecifying it in such a way that many extremely useful optimizations are disallowed. But do not let this confuse you: `seq` is **intended to be used for this purpose**, in all Haskell compilers, not just GHC. So although its behavior is not formally guaranteed, it informally is, and you should not worry too much about the looseness in its specification for that reason. – Alexis King Apr 07 '21 at 22:48
  • if not the Report then at least GHC docs should add "the purpose of `seq a b` is to serve as a signal to a compiler to prevent thunk buildup in `a` when evaluating `b`". maybe something like that. – Will Ness Apr 08 '21 at 15:42
  • @AlexisKing, the only reason `seq` has to be magical is that it's overly polymorphic. I gather that in the early days there was something like `class Eval a where seq :: a -> b -> b`. Of course, that can be defined for any concrete datatype (not function) using pattern matching. They decided to make it magical for convenience, and maybe a little performance, and weakened parametricitry another little bit. Then we also had to start getting into decisions like `-fpedantic-bottoms`, because `seq` on functions is a weird thing to do and did we really mean that? – dfeuer Aug 26 '21 at 14:30
11

seq introduces an artificial data dependency between two thunks. Normally, a thunk is forced to evaluate only when pattern-matching demands it. If the thunk a contains the expression case b of { … }, then forcing a also forces b. So there is a dependency between the two: in order to determine the value of a, we must evaluate b.

seq specifies this relationship between any two arbitrary thunks. When seq c d is forced, c is forced in addition to d. Note that I don’t say before: according to the standard, an implementation is free to force c before d or d before c or even some mixture thereof. It’s only required that if c does not halt, then seq c d also doesn’t halt. If you want to guarantee evaluation order, you can use pseq.

The diagrams below illustrate the difference. A black arrowhead (▼) indicates a real data dependency, the kind that you could express using case; a white arrowhead (▽) indicates an artificial dependency.

  • Forcing seq a b must force both a and b.

      │
    ┌─▼───────┐
    │ seq a b │
    └─┬─────┬─┘
      │     │  
    ┌─▽─┐ ┌─▼─┐
    │ a │ │ b │
    └───┘ └───┘
    
  • Forcing pseq a b must force b, which must first force a.

      │
    ┌─▼────────┐
    │ pseq a b │
    └─┬────────┘
      │
    ┌─▼─┐
    │ b │
    └─┬─┘
      │
    ┌─▽─┐
    │ a │
    └───┘
    

As it stands, it must be implemented as an intrinsic because its type, forall a b. a -> b -> b, claims that it works for any types a and b, without any constraint. It used to belong to a typeclass, but this was removed and made into a primitive because the typeclass version was considered to have poor ergonomics: adding seq to try to fix a performance issue in a deeply nested chain of function calls would require adding a boilerplate Seq a constraint on every function in the chain. (I would prefer the explicitness, but it would be hard to change now.)

So seq, and syntactic sugar for it like strict fields in data types or BangPatterns in patterns, is about ensuring that something is evaluated by attaching it to the evaluation of something else that will be evaluated. The classic example is foldl'. Here, the seq ensures that when the recursive call is forced, the accumulator is also forced:

foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' f acc [] = acc
foldl' f acc (x : xs)
  = acc' `seq` foldl' f acc' xs
  where
    acc' = f acc x

That requests of the compiler that if f is strict, such as (+) on a strict data type like Int, then the accumulator is reduced to an Int at each step, rather than building a chain of thunks to be evaluated only at the end.

Jon Purdy
  • 53,300
  • 8
  • 96
  • 166
  • 5
    The last example is precisely what confuses me about the whole `seq` vs `pseq`. Since `seq` does not guarantee the evaluation order, it seems that there is no guarantee that `acc'` is forced _before_ the recursive call, only that it is at some point. Hence, it might indeed happen that the chain of thunks is built anyway before each `acc'` is forced. GHC does evaluate `acc'` early (AFAIK), but it looks like one should use `pseq` here (?). I'd like to be proved wrong on this, though. If I am correct, though, I can't see how `seq` can be used to ensure performance. – chi Apr 04 '21 at 19:20
  • 1
    @chi: Strictly (ha) I think you’re right: `pseq` is the right choice here for performance. The only reason I can really think of for using `seq` instead of `pseq` is in an expression like ``(a `seq` b) `pseq` c``, where I *do* care that `a` and `b` are evaluated before `c`, but I *don’t* care whether `a` or `b` comes first. The Report takes extreme care not to mandate an implementation strategy, but I think it should have mandated *allowing* the programmer to *demand* sequencing & sharing. As it stands, there’s an awful lot of code depending subtly on GHC’s semantics of `seq` and `let`. – Jon Purdy Apr 04 '21 at 20:01
  • 1
    I'm indeed reasoning about strict guarantees since those are what makes `seq` and `pseq` different. In theory, one could define `seq x y = pseq y x` and respect the requirements of the Report, I think, but that would horribly break the performance of a lot of code which would then be no longer usable. Perhaps we need a cost model for Haskell (at least an asymptotic one), but I guess it's hard to define one without constraining the implementation too much. – chi Apr 04 '21 at 20:18
  • @chi, I think you're right, strictly speaking. But GHC tries to optimize code, not to deoptimize it. So in most cases, `seq` will actually give good results. In `foldl'`, once some transformations happen, we'll have something that looks like ``go [] acc = acc; go (x : xs) acc = acc `seq` go xs (f acc x)``. GHC has two practical options: ``acc `pseq` go xs (f acc x)`` and ``let r = go xs (f acc x) in r `pseq` acc `pseq` r``. The former code is nicely tail recursive, and therefore the "obvious" thing to do. The latter is a mess; who would do that? – dfeuer Apr 04 '21 at 20:45
  • Where you're more likely to run into reordering is in something like ``f x y = y `seq` x + y``. There's a very good chance that the ``y `seq` `` will be ignored altogether and GHC will decide for itself whether to force `x` or `y` first. But in typical cases, that's just not going to make any significant difference for performance. – dfeuer Apr 04 '21 at 20:48
  • @dfeuer what you said is what we hope for, but it's not a _guarantee_. this whole situation with `seq` doesn't make much sense. why have `seq` at all then if only `pseq` really does it? and if there's no guarantee with `seq` then there's none with bang patterns either. (!!?) – Will Ness Apr 04 '21 at 22:31
  • 3
    @WillNess, I don't know full details, but I imagine that kind of extreme order control will interfere with demand analysis and the worker-wrapper transformation, which are really important for optimization. In code that will be compiled with optimization, it might be better to think of `seq` as *allowing* the compiler to evaluate something earlier (which it is probably very eager to do) rather than *forcing* it to do so. – dfeuer Apr 04 '21 at 22:42
  • 1
    @WillNess, here's a small example: let's say I write ``cond :: Bool -> Int -> Int; cond True x y = x `seq` y; cond False x y = y `seq` x``. GHC is going to look at that and say that `cond` is strict in all three of its arguments, and therefore it's allowed to evaluate all three before calling `cond`. Among other things, this lets it use worker-wrapper (which it probably won't do here, but pretend the function is more complicated). `wcond :: Bool -> Int# -> Int# -> Int#; wcond True x _ = x; wcond False _ y = y; cond b (I# x) (I# y) = I# (wcond b x y)`. Lots of unboxing opportunities! – dfeuer Apr 04 '21 at 22:49
  • @dfeuer I see, thanks. ---- all I want is clarity. if `seq` *always* works when I use it to prevent the thunks blow-up, they should just say so in the "docs", in those very words. explicitly. if it sometimes does even better, that's a definite plus. but we need clarity, not vagueness. – Will Ness Apr 05 '21 at 07:35
  • @JonPurdy, I've clarified a bit my question. Would you mind giving a look at it, in case you see a chance to integrate your answer with other bits? – Enlico Apr 05 '21 at 14:39
  • @WillNess, there's adjust no "always" in code optimization. – dfeuer Apr 05 '21 at 15:00
  • @dfeuer but it's not an optimization. it is an essential part of practical Haskell. or rather, it ***should*** be. bang patterns are in the manual after all, it's not some secret lore. – Will Ness Apr 05 '21 at 15:03
  • @WillNess, let's discuss this elsewhere. Not the right forum/format. – dfeuer Apr 05 '21 at 19:42
  • 2
    @WillNess You might find the discussion on https://stackoverflow.com/questions/48410245/always-guaranteed-evaluation-order-of-seq-with-strange-behavior-of-pseq-in elucidating. – Alexis King Apr 06 '21 at 06:49
0

Real World Haskell is mistaken, and all the other things you quoted are correct. If you care deeply about the evaluation order, use pseq instead.

Daniel Wagner
  • 145,880
  • 9
  • 220
  • 380