3

I'm playing with some Data Parallel Haskell code and found myself in need of a prefix sum. However I didn't see any basic operator in the dph package for prefix sum.

I rolled my own, but, since I'm new to dph, I'm not sure if it's properly taking advantage of parallelization:

{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module PrefixSum ( scanP ) where
import Data.Array.Parallel (lengthP, indexedP, mapP, zipWithP, concatP, filterP, singletonP, sliceP, (+:+), (!:))
import Data.Array.Parallel.Prelude.Int ((<=), (-), (==), Int, mod)
-- hide prelude
import qualified Prelude 

-- assuming zipWithP (a -> b -> c) given 
-- [:a:] of length n and
-- [:b:] of length m, n /= m
-- will return
-- [:c:] of length min n m

scanP :: (a -> a -> a) -> [:a:] -> [:a:]
scanP f xs = if lengthP xs <= 1
                then xs
                else head +:+ tail
  where -- [: x_0, x_2, ..., x_2n :]
        evens = mapP snd . filterP (even . fst) $ indexedP xs
        -- [: x_1, x_3 ... :]
        odds = mapP snd . filterP (odd . fst)  $ indexedP xs
        lenEvens = lengthP evens
        lenOdds = lengthP odds
        -- calculate the prefix sums [:w:] of the pair sums [:z:]
        psums = scanP f $ zipWithP f evens odds
        -- calculate the total prefix sums as 
        -- [: x_0, w_0, f w_0 x_2, w_1, f w_1 x_4, ..., 
        head = singletonP (evens !: 0)
        body = concatP . zipWithP (\p e -> [: p, f p e :]) psums $ sliceP 1 lenOdds evens
        -- ending at either
        --    ... w_{n-1}, f w_{n-1} x_2n :]
        -- or
        --    ... w_{n-1}, f w_{n-1} x_2n, w_n :]
        -- depending on whether the length of [:x:] is 2n+1 or 2n+2
        tail = if lenEvens == lenOdds then body +:+ singletonP (psums !: (lenEvens - 1)) else body

-- reimplement some of Prelude so it can be vectorised
f $ x = f x
infixr 0 $
(.) f g y = f (g y)

snd (a,b) = b
fst (a,b) = a

even n = n `mod` 2 == 0
odd n = n `mod` 2 == 1
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
rampion
  • 87,131
  • 49
  • 199
  • 315
  • Hmm, is it even parallelizable? Seems like a pretty serial idea, but maybe I am missing something. – luqui Jan 31 '12 at 15:16
  • @luqui: The parallel prefix sum algorithm for an array of size `n` takes `O(log n)` parallel rounds of computation. There are two phases. In the forward phase, given `{a_i | i \in [0,2n-1] }` you calculate `{ a_2i + a_{2i+1} | i \in [0,n-1] }` using `n/2` parallel additions. In the backward phase, given `{ \sum_0^{2i+1} a_j | i \in [0,n-1] }` and `{ a_i | i \in [0,2n-1] }`, you calculate `{ \sum_0^i a_j | i \in [0, 2n-1 }` using `n/2` parallel additions. – rampion Jan 31 '12 at 16:24
  • @luqui: naturally, this only works properly for associative `+`, since there's the inherent assumption that `(a_0 + a_1) + (a_2 + a_3) == ((a_0 + a_1) + a_2) + a_3` – rampion Jan 31 '12 at 16:27
  • For a normal array, this would be `scanl`, but I'm not seeing that in dph-par. – Louis Wasserman Jan 31 '12 at 23:23
  • right. scanl is O(n) work in O(n) time, scanP is meant to be O(n) work in O(log n) time. – rampion Feb 01 '12 at 10:03
  • Well... do the benchmarks agree with your expected numbers? That's not the best, but certainly a viable, way to determine if you are doing work in parallel. – Thomas M. DuBuisson Feb 01 '12 at 18:45

1 Answers1

2

Parallel prefix scans are supported, in fact, they're rather fundamental. So just pass (+) as your associative operator.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468