14

We could fuse two traversals over the list xs in the expression

(map f xs, map g xs)

like so

unzip (map (\x -> (f x, g x)) xs)

Is there any reasearch on performing this kind of fusion automatically?

(There's a risk to create a space leak here if one of the returned lists is consumed before the other. I'm more interested in preventing the extra traversal over xs than saving space.)

Edit: I'm actually not looking to apply the fusion to actual in-memory Haskell lists, where this transformation might not make sense depending on if the unzip can be fused with its consumer(s). I have a setting where I know unzip can fuse (see "FlumeJava: easy, efficient data-parallel pipelines").

tibbe
  • 8,809
  • 7
  • 36
  • 64
  • 2
    Not automatic, but quite nice anyway: http://squing.blogspot.com/2008/11/beautiful-folding.html – Daniel Wagner Sep 03 '12 at 04:31
  • 1
    Unless the result of this fuses with something else the overhead of creating the pairs and unzipping them will be bigger than cost of the extra traversal. – augustss Sep 03 '12 at 12:44
  • 1
    @augustss Not if the traversal is over a huge file! I'm not planning to apply this to actual lists. – tibbe Sep 03 '12 at 14:35
  • The unzip will traverse a list of the same length as the map, so you will not be saving traversals. Now if you care about saving the space that the fusion gives you then it can make a huge difference. I inferred from your comment that you were not interested in space, but I see that's not exactly what you said. :) – augustss Sep 03 '12 at 14:46
  • " preventing the extra traversal over xs " can also be accomplished by most of the iterator-style packages. – Chris Kuklewicz Sep 03 '12 at 15:41
  • @augustss I have a consumer that can fuse with the unzip. I'm trying to cast sibling fusion, as described in "FlumeJava: easy, efficient data-parallel pipelines" (http://pages.cs.wisc.edu/~akella/CS838/F12/838.../FlumeJava.pdf), in one of the Haskell fusion frameworks. – tibbe Sep 03 '12 at 16:07

2 Answers2

4

Also not fully automatic, but you can give GHC a list of rewrite rules like that. See 7.14 Rewrite rules and Using rules. Then the compiler uses these rules to optimize your program when compiling. (Note that the compiler in no way checks if the rules make any sense.)

Edit: To give an example for this particular problem, we can write:

{-# OPTIONS_GHC -fenable-rewrite-rules -ddump-rule-firings -ddump-rule-rewrites #-}

import Data.Char

{-# RULES
"map/zip" forall f g xs. (,) (map f xs) (map g xs) = unzip (map (\x -> (f x, g x)) xs)
   #-}

main :: IO ()
main = let x = "abCD" in
        print $ (,) (map toUpper x) (map toLower x)

(the top-level function name in the rule is (,) :: a -> b -> (a, b)). When compiling, you'll see how the rules are applied. Option dump-rule-firings shows a message when a rule is applied and -ddump-rule-rewrites displays each rule application in detail - see 7.14.6. Controlling what's going on in rewrite rules.

Petr
  • 62,528
  • 13
  • 153
  • 317
  • I don't think we can write a rule to match these kind of expressions. GHC rules must start with a function name. – tibbe Sep 03 '12 at 23:16
3

I've managed to find two resources that mentions fusion (un-)zip like functions, at least briefly:

Josef Svenningsson. "Shortcut Fusion for Accumulating Parameters & Zip-like Functions" http://www.cse.chalmers.se/~josefs/publications/fusion.pdf

Duncan Coutts. "Stream Fusion: Practical shortcut fusion for coinductive sequence types" https://community.haskell.org/~duncan/thesis.pdf

Neither resources mentions this kind of "sibling fusion" explicitly though.

tibbe
  • 8,809
  • 7
  • 36
  • 64
  • 1
    I didn't see this presentation, but here are Josef's slides about [TupleFusion](http://wiki.portal.chalmers.se/cse/uploads/FP/Josef_TupleFusion.pdf). – danr Sep 03 '12 at 17:13
  • [Towards an automated tupling strategy](http://dl.acm.org/citation.cfm?id=154643) might be interesting. – Jan Christiansen Sep 04 '12 at 07:44