Ever increasing CPU consumption with Haskell and stream-fusion

Question

Here is a short Haskell program that generates a 440 Hz sound. It uses pulseaudio as an audio backend.

import GHC.Float
import Control.Arrow
import Sound.Pulse.Simple
import qualified Data.List.Stream as S
import Data.List

type Time = Double
type Frequency = Double
type Sample = Double
type CV = Double

chunksize = 441 * 2
sampleRate :: (Fractional a) => a
sampleRate = 44100

integral :: [Double] -> [Double]
integral = scanl1  (\acc x -> acc + x / sampleRate)

chunks :: Int -> [a] -> [[a]]
chunks n = S.takeWhile (not . S.null) . S.unfoldr (Just . S.splitAt n)

pulseaudioOutput :: [Sample] -> IO ()
pulseaudioOutput sx = do

    pa <- simpleNew Nothing "Synths" Play Nothing "Synths PCM output"
         (SampleSpec (F32 LittleEndian) 44100 1) Nothing Nothing

    mapM_ (simpleWrite pa . S.map double2Float) $ chunks 1000 sx

    simpleDrain pa
    simpleFree pa

oscSine :: Frequency -> [CV] ->  [Sample]
oscSine f = S.map sin <<< integral <<< S.map ((2 * pi * f *) . (2**))

music ::[Sample]
music = oscSine 440 (S.repeat 0)

main = do
    pulseaudioOutput music

If I compile and run this, I see an ever growing CPU consumption.

If I change "S.splitAt" to "splitAt" in the definition of "chunks", everything is fine.

Can anyone guess why this can be?

Thank you.

Update

In the following code all three version of chunks can produce the aforementioned behaviour:

import GHC.Float
import Control.Arrow
import Sound.Pulse.Simple
import Data.List.Stream

import Prelude hiding ( unfoldr
                      , map
                      , null
                      , scanl1
                      , takeWhile
                      , repeat
                      , splitAt
                      , drop
                      , take
                      )

type Time = Double
type Frequency = Double
type Sample = Double
type CV = Double

chunksize = 441 * 2
sampleRate :: (Fractional a) => a
sampleRate = 44100

integral :: [Double] -> [Double]
integral = scanl1  (\acc x -> acc + x / sampleRate)

chunks :: Int -> [a] -> [[a]]
--chunks n = takeWhile (not . null) . unfoldr (Just . splitAt n)
--chunks n xs = take n xs : chunks n (drop n xs)
chunks n xs = h : chunks n t
    where
        (h, t) = splitAt n xs

pulseaudioOutput :: [Sample] -> IO ()
pulseaudioOutput sx = do

    pa <- simpleNew Nothing "Synths" Play Nothing "Synths PCM output"
         (SampleSpec (F32 LittleEndian) 44100 1) Nothing Nothing

    mapM_ (simpleWrite pa . map double2Float) $ chunks 1000 sx

    simpleDrain pa
    simpleFree pa

oscSine :: Frequency -> [CV] ->  [Sample]
oscSine f = map sin <<< integral <<< map ((2 * pi * f *) . (2**))

music ::[Sample]
music = oscSine 440 (repeat 0)

main = do
    pulseaudioOutput music

I cleaned up the code to avoid mixing plain old lists and stream-fusion lists. The memory / cpu leak is still there. To see that the code is working on old lists, just remove the Prelude import and ".Stream" after "Data.List".

Basically, lists are not suited for audio samples anyway. It kind of works, but never expect the performance to be good enough for anything nontrivial. (Though [the trivial stuff can already be quite exciting](http://codegolf.stackexchange.com/questions/11463/mozart-golf-mini-rondo/11514#11514)...) — leftaroundabout, Mar 11 '14 at 17:28
Indeed, but I'd like to cross that bridge when I came to it. If I use plain old lists, the performance is actually pretty OK for what it really does at the hardware level :) 1-2% CPU for a single oscillator is OK in a practical setting, since expensive synthesizer programs can eat up one or two cores for a simple synth patch. (Yes, they're poorly written. Whatever.) I can actually run a 128th order FIR with using only lists (ok, maybe a Sequence for the coefficients), so the performance is ok for me as long as this stuff is experimental. Thanks for the link, very nice! :) — netom, Mar 12 '14 at 06:59
Really, 128th order FIR? So you run an FFT directly on lists, like [user3407776 does](http://stackoverflow.com/questions/22335577/how-to-apply-data-parallelim-on-haskell-fast-fourier-transformation/22336721#22336721)? I mean, cool and everything, that GHC makes lists so fast it's possible, but I can understand _why_! You'd get yet much better performance with tight array chunks, and Haskell's type system is quite able to abstract them away so it looks just as nice as with lists. — leftaroundabout, Mar 12 '14 at 09:28
As I told so, I'll cross that bridge what I come to it. This is not about performance, it's just an experiment. This question in particular is about a possible bug in my code or in the stream-fusion library. Currently I'm not interested in other aspects. — netom, Mar 12 '14 at 11:17

sclv · Accepted Answer · 2015-02-24T19:08:11.807

The splitAt on streams that is substituted by the fusion rules (http://hackage.haskell.org/package/stream-fusion-0.1.2.5/docs/Data-Stream.html#g:12) has the following signature:

splitAt :: Int -> Stream a -> ([a], [a])

From this we can see that since it produces lists and not streams, that obstructs further fusion. The correct thing to do, I think, is to produce either a splitAt that generates streams, or better yet to write a chunks function directly on streams with the appropriate fusion rules from the list version.

Here is a splitAt on streams that I think should be good. You would of course need to pair it with the appropriate rewrite rules from a splitAt on lists, and if those rewrite rules get tricky, perhaps write the chunks function directly, though it seems a bit tricky to do so as well:

splitAt :: Int -> Stream a -> (Stream a, Stream a)
splitAt n0 (Stream next s0)
  | n0 < 0    = (nilStream, (Stream next s0))
  | otherwise = loop_splitAt n0 s0
  where
    nilStream = Stream (const Done) s0
    loop_splitAt  0 !s = (nilStream, (Stream next s))
    loop_splitAt !n !s = case next s of
      Done            -> (nilStream, nilStream)
      Skip    s'      -> loop_splitAt n s'
      Yield x s'      -> (cons x xs', xs'')
        where
          (xs', xs'') = loop_splitAt (n-1) s'

Thank you, even after this much time your explanation has enlightened me. I might give this project of mine a second chance ;) — netom, Feb 24 '15 at 10:09
@netom, you probably will need to add some inlining annotations here. Based on what `vector` does with similar things, I think probably `{-# INLINE [1] splitAt #-}` and `{-# INLINE [0] loop_splitAt #-}`. But any time you're dealing with this sort of fusion stuff, you should be sure to profile, profile, profile and spend some time groveling over the Core. I recommend `-ddump-simpl -dsuppress-all -dno-suppress-type-signatures`. If you've never seen Core before, it may look a little scary, but it's really fairly similar to Haskell. — dfeuer, Feb 24 '15 at 19:48

Ever increasing CPU consumption with Haskell and stream-fusion

Update

1 Answers1