How to create a very large list in erlang

Question

we use lists:seq(1,100) to create a list with 100 element and the create speed is acceptable

but what if we create a list with 1000w element? lists:seq(1,10000000) is very slow.

do you have any idea to create a large list without any datacopy?

tkowal · Accepted Answer · 2014-09-11T10:22:40.517

5

Actually, lists:seq/2 does not perform any copying. Here is actual implementation:

seq(First, Last)
    when is_integer(First), is_integer(Last), First-1 =< Last -> 
    seq_loop(Last-First+1, Last, []).

seq_loop(N, X, L) when N >= 4 ->
    seq_loop(N-4, X-4, [X-3,X-2,X-1,X|L]);
seq_loop(N, X, L) when N >= 2 ->
    seq_loop(N-2, X-2, [X-1,X|L]);
seq_loop(1, X, L) ->
    [X|L];
seq_loop(0, _, L) ->
    L.

So, it calls itself recursively, adding up to 4 elements to the accumulator in each call (third argument). Prepending elements to list does not involve any copying and the calls are tail recursive, so the stack has constant size.

EDIT: I am pasting some clarifications from comments for the sake of completeness: Erlang lists are linked lists. Internally [H|T], creates element H and adds pointer to T. T is never copied. You can do that, because T is immutable and it will never change, even if you create [H1 | T], [H2 | T] and [H3 | T] - T is shared between them.

Calling function with list as argument also doesn't involve copying :) Bigger data structures including lists are stored on the process heap. You only store pointer to the first element on the stack. Only sending message to another process would actually copy the list.

Creating very large list may be slow, because you can run out of physical memory and start using swap. END OF EDIT

About iterating, I would like to build on Nicolas Talfer answer - you sometimes want to pass arguments to function called in loop and you can use higher order functions and closures to do that:

main(_) ->
    InitialArgs = 6,
    InitialFun = fun() -> do_stuff(InitialArgs) end,
    Ret = forloop(InitialFun, 5),
    io:format("~p", [Ret]).

forloop(Fun, 1) ->
    Fun();
forloop(Fun, Count) ->
    RetVal = Fun(),
    NewFun = fun() -> do_stuff(RetVal) end,
    forloop(NewFun, Count-1).

do_stuff(Args) ->
    Args + 2.

edited Sep 11 '14 at 10:22

answered Sep 11 '14 at 09:28

tkowal

9,129
1
27
51

it still copies [X-3,X-2,X-1,X|L] :( – CaTFooD Sep 11 '14 at 09:33
1

No, it doesn't. Erlang lists are linked lists. Internally [H|T], creates element H and adds pointer to T. T is never copied. You can do that, because T is immutable and it will never change, even if you create [H1 | T], [H2 | T] and [H3 | T] - T is shared between them. – tkowal Sep 11 '14 at 09:44
yes, [H3 |T ] doesn't copies any thing, but when you pass this list as a argument to the function, it copies the list :( – CaTFooD Sep 11 '14 at 10:00
1

@CaTFooD Calling function with list as argument also doesn't involve copying :) Bigger data structures including lists are stored on the process heap. You only store pointer to the first element on the stack. Only sending message to another process would actually copy the list. – tkowal Sep 11 '14 at 10:09
@tkowal this confuse me a lot :( when i call lists:seq(1,20000000). the process takes about 2G memory for beam.smp to generate the list, but when i store the generated list to ets, in observer saids it takes 305M memory. – CaTFooD Sep 11 '14 at 10:40
Check out this question about ETS memory: http://stackoverflow.com/questions/21973760/how-to-identify-the-exact-memory-size-of-an-ets-table Have you multiplied your result by word size? (probably 4). This would give 1GB for ETS and second for process heap. – tkowal Sep 11 '14 at 11:09
@CaTFooD passing a data structure as an argument to a function **NEVER** entails copying! Seeing all data is immutable I can always just pass a reference, if the called function modifies the data it will end up doing some copying but my structure will never be modified. Generally copying does NOT result in copying the whole structure just those bits which have been modified. – rvirding Sep 11 '14 at 12:11

score 4 · Answer 2 · answered Sep 11 '14 at 08:53

4

I assume you want to repeat an operation N times, independantly of the list content. Am I wrong? ie something like this:

[ fun() -> do_stuff() end || _X <- lists:seq(1,10000000) ].

Consider doing this instead:

foo(0) ->
    ok;
foo(Count) ->
    do_stuff(),
    foo(Count-1).

and just call foo(10000000).

answered Sep 11 '14 at 08:53

Nicolas Talfer

41
2

this works, but i still wants to know how to create such large list effective – CaTFooD Sep 11 '14 at 09:30

score 2 · Answer 3 · answered Sep 11 '14 at 12:12

2

In fact you could try to use lazy lists: It's look like this:

seq(M, N) when M =< N ->
    fun() -> [M | seq(M+1, N)] end;
seq(_, _) ->
    fun () -> [] end.

E = seq(1, 1000000000000000).
[1|NextElem] =  E().
[2|NextElem2] = NextElem().

answered Sep 11 '14 at 12:12

Elzor

804
8
8

How to create a very large list in erlang

3 Answers3