94

I understand that a list actually contains values, and a sequence is an alias for IEnumerable<T>. In practical F# development, when should I be using a sequence as opposed to a list?

Here's some reasons I can see when a sequence would be better:

  • When interacting with other .NET languages or libraries that require IEnumerable<T>.
  • Need to represent an infinite sequence (probably not really useful in practice).
  • Need lazy evaluation.

Are there any others?

dodgy_coder
  • 12,407
  • 10
  • 54
  • 67
  • 3
    I find infinite sequences very useful and common. System.Random.Next() is already an "infinite sequence" in disguise, for instance. Often I want something that generates as many elements as needed. I recently wrote a Tetris in F# and represented the block generation as an infinite sequence: it will create as many as required as the game goes on. – Asik May 30 '12 at 20:06
  • 2
    @Dr_Asik Note that a `seq` generated that way will produce *different* random numbers each time you look at it. That can obviously be a source of non-deterministic bugs... – J D Jun 01 '12 at 08:20

5 Answers5

111

I think your summary for when to choose Seq is pretty good. Here are some additional points:

  • Use Seq by default when writing functions, because then they work with any .NET collection
  • Use Seq if you need advanced functions like Seq.windowed or Seq.pairwise

I think choosing Seq by default is the best option, so when would I choose different type?

  • Use List when you need recursive processing using the head::tail patterns
    (to implement some functionality that's not available in standard library)

  • Use List when you need a simple immutable data structure that you can build step-by-step
    (for example, if you need to process the list on one thread - to show some statistics - and concurrently continue building the list on another thread as you receive more values i.e. from a network service)

  • Use List when you work with short lists - list is the best data structure to use if the value often represents an empty list, because it is very efficient in that scenario

  • Use Array when you need large collections of value types
    (arrays store data in a flat memory block, so they are more memory efficient in this case)

  • Use Array when you need random access or more performance (and cache locality)

Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • 1
    Great thanks - exactly what I was after. Its confusing when learning F# to work out why there's these two elements (list & seq) that give you similar functionality. – dodgy_coder May 30 '12 at 11:29
  • 3
    *"Use `List` [...] when you need a simple **immutable data structure** that you can build step-by-step [...] and concurrently **continue building the list** on another thread[...]"* Can you please elaborate on your meaning here/how that works? Thanks. – Noein Apr 07 '15 at 18:00
  • 2
    @Noein The idea is that you can always iterate over lists (they are immutable) but you can create new lists using `x::xs` without breaking any existing workers that might be in process of iterating over `xs` – Tomas Petricek Apr 08 '15 at 14:15
32

Also prefer seq when:

  • You don't want to hold all elements in memory at the same time.

  • Performance is not important.

  • You need to do something before and after enumeration, e.g. connect to a database and close connection.

  • You are not concatenating (repeated Seq.append will stack overflow).

Prefer list when:

  • There are few elements.

  • You'll be prepending and decapitating a lot.

Neither seq nor list are good for parallelism but that does not necessarily mean they are bad either. For example, you could use either to represent a small bunch of separate work items to be done in parallel.

J D
  • 48,105
  • 13
  • 171
  • 274
  • "Neither seq nor list are good for parallelism": could you expand on why seq is not good for parallelism? What's good for parallelism then, array only? – Asik May 30 '12 at 20:11
  • 5
    @Dr_Asik Arrays are best because you can recursively subdivide them and retain good locality of reference. Trees are next best because you can subdivide them too but locality of reference is not so good. Lists and sequences are bad because you cannot subdivide them. If you farm out alternate elements then you get the worst possible locality of reference. Guy Steele has discussed linear collections impeding parallelism although he only considers work and depth and not locality (aka *cache complexity*). http://labs.oracle.com/projects/plrg/Publications/ICFPAugust2009Steele.pdf – J D May 31 '12 at 22:40
13

Just one small point: Seq and Array are better than List for parallelism.

You have several options: PSeq from F# PowerPack, Array.Parallel module and Async.Parallel (asynchronous computation). List is awful for parallel execution due to its sequential nature (head::tail composition).

pad
  • 41,040
  • 7
  • 92
  • 166
  • That's a good point - the scenario I had in mind was when you need to build the collection on one thread (i.e. as you receive values from some service) and use it from another thread (i.e. to calculate statistics and display it). I agree that for parallel processing (when you already have all data in memory), having `Array` or `PSeq` are much better. – Tomas Petricek May 30 '12 at 11:29
  • 1
    Why do you say that `seq` is better than `list` for parallelism? `seq` are also awful for parallel execution due to their sequential nature... – J D Jun 01 '12 at 08:19
8

list is more functional, math-friendly. when each element is equal, 2 lists are equal.

sequence is not.

let list1 =  [1..3]
let list2 =  [1..3]
printfn "equal lists? %b" (list1=list2)

let seq1 = seq {1..3}
let seq2 = seq {1..3}
printfn "equal seqs? %b" (seq1=seq2)

enter image description here

Rm558
  • 4,621
  • 3
  • 38
  • 43
6

You should always expose Seq in your public APIs. Use List and Array in your internal implementations.

Aleš Roubíček
  • 5,198
  • 27
  • 29
  • Is that because they work nicely with other .NET languages? i.e. because a `Seq` is seen as an `IEnumerable`? – dodgy_coder May 31 '12 at 07:04
  • No it because of good design practices. Expose as much information as necessary, no more. – Aleš Roubíček May 31 '12 at 09:02
  • Ok fair comment, this is good practice for C# code as well - i.e. a function might best be defined as IEnumerable rather than the heavier List for example. – dodgy_coder May 31 '12 at 10:22
  • 1
    I agree in the contexts of mutable return structures (e.g. this design flaw in .NET: http://msdn.microsoft.com/en-us/library/afadtey7.aspx) or APIs that may be used from other .NET languages but I don't agree in general, partly because `seq` are so bad for parallelism. – J D Jun 01 '12 at 08:36