13

Clearly Seq asymptotically performs the same or better as [] for all possible operations. But since its structure is more complicated than lists, for small sizes its constant overhead will probably make it slower. I'd like to know how much, in particular:

  1. How much slower is <| compared to :?
  2. How much slower is folding over/traversing Seq compared to folding over/traversing [] (excluding the cost of a folding/traversing function)?
  3. What is the size (approximately) for which \xs x -> xs ++ [x] becomes slower than |>?
  4. What is the size (approximately) for which ++ becomes slower than ><?
  5. What's the cost of calling viewl and pattern matching on the result compared to pattern matching on a list?
  6. How much memory does an n-element Seq occupy compared to an n-element list? (Not counting the memory occupied by the elements, only the structure.)

I know that it's difficult to measure, since with Seq we talk about amortized complexity, but I'd like to know at least some rough numbers.

Petr
  • 62,528
  • 13
  • 153
  • 317
  • 6
    No such comparison exists. I suggest generating it yourself using the criterion benchmarking library. – Don Stewart Feb 10 '13 at 14:35
  • 6
    If someone eventually decides to perform benchmarking on this one, it would make a whole lot of sense to also drop in a `Vector` for comparison. Would be very nice to see the results. Favoriting this one. – Nikita Volkov Feb 10 '13 at 14:38
  • 2
    This is also difficult (or not particularly useful) since in real code , with GHC, intermediate lists tend to get fused and re-written and disappeared, making them more like control structure than a data structure per se. – jberryman Feb 10 '13 at 21:51
  • @jberryman could the same thing be done with a `Sequence`? Couldn't we write the same-ish rules, and then get the same "control structure" effect? – Dan Burton Feb 13 '13 at 20:50
  • @DanBurton that's mostly black magic to me, but I guess you would want a brand of `build` from GHC.Exts which would work with `Foldable.foldr`? – jberryman Feb 13 '13 at 22:55

2 Answers2

14

This should be a start - http://www.haskell.org/haskellwiki/Performance#Data.Sequence_vs._lists

A sequence uses between 5/6 and 4/3 times as much space as the equivalent list (assuming an overhead of one word per node, as in GHC). If only deque operations are used, the space usage will be near the lower end of the range, because all internal nodes will be ternary. Heavy use of split and append will result in sequences using approximately the same space as lists. In detail:

  • a list of length n consists of n cons nodes, each occupying 3 words.
  • a sequence of length n has approximately n/(k-1) nodes, where k is the average arity of the internal nodes (each 2 or 3). There is a pointer, a size and overhead for each node, plus a pointer for each element, i.e. n(3/(k-1) + 1) words.

List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.

manojlds
  • 290,304
  • 63
  • 469
  • 417
1

I have one more concrete result to add to above answer. I am solving a Langevin equation. I used List and Data.Sequence. A lot of insertions at back of list/sequence are going on in this solution.

To sum up, I did not see any improvement in speed, in fact performance deteriorated with Sequences. Moreover with Data.Sequence, I need to increase the memory available for Haskell RTS.

Since I am definitely not an authority on optimizing; I post the both cases below. I'd be glad to know if this can be improved. Both codes were compiled with -O2 flag.

  1. Solution with List, takes approx 13.01 sec
  2. Solution with Data.Sequence, takes approx 15.13 sec
Dilawar
  • 5,438
  • 9
  • 45
  • 58
  • How long are your sequences/lists? – Petr Nov 02 '14 at 15:20
  • Both of them had 10^6 elements. – Dilawar Nov 04 '14 at 08:05
  • 2
    If anyone lands here I tested the linked code out of curiousity and on my machine seq was slightly faster. (~5%) It's also a bad example because here the two cases compared boil down to either building a list by appending at the front and reversing. Or building the Seq by appending at the end and then converting it to a list instead of using it directly. But even then on my Machine lists are slower so I imagine the speed differences where not caused by the choice of Data Structure but other changes in the code. – Andreas Klebinger Jul 27 '16 at 23:24