6

Any easy question for the Mathematica experts here:

Given a list, say

Clear[a, b, c];
data = {a, b, c};

and I want to get back all lists of length 1,2,3,...Length[data] starting from the start to the end, so that I get the following for the above

out = {{a}, {a, b}, {a, b, c}}

I looked at the commands in M to find a ready one to use, and I could (looked at all the Map's and Nest* functions, but not that I can see how to use for this). I am sure it is there, but I am not seeing it now.

now I do this silly Do loop to build it

m=Length[data];
First@Reap[Do[Sow[data[[1;;i]]],{i,1,m}]][[2]]

{{a},{a,b},{a,b,c}}

question is: does Mathematica have a build-in command to do the above?

update 8 am

I've deleted the tests I've done an hr ago and will be reposting them again soon. I need to run them few times and take an average as that is the better way to do this performance test.

update 9 am

Ok, I've re-run the performance tests on all solutions shown below. 8 methods. For each method, I run it 5 times and took the mean. I did this for n={1000, 5000, 10000, 15000, 25000, 30000} where n is the length of the original list to process.

can't go much over 30,000, will run out of ram. I only have 4 GB ram.

I made a small function called makeTable[n, methods] which generate performance table for specific n. test code is below (written quickly so not the most clean code, not very functional, as I have to go :), but it is below and any one can change/clean it, etc... if they want

conclusion: Kguler method was the fastest, with Thies method almost the same for large n, (30,000), so for all practical purposes, may be Thies and Kguler methods can be declared as the winners for large n? But since Kguler is also fastest for small n, so far, he gets the clear edge.

Again, test code is below for any one to check and run to see if I might made an error somewhere. As correctly predicted by Leonid, the linked list method did not fare too well for large n.

I think more tests are needed, as only taking the mean of 5 might not be enough, also other considerations I might have missed. This is not an exact test, just a rough one to get an idea.

I tried not to use the pc much while running the tests. I used AbsoluteTiming[] to measure cpu.

Here is screen shot of the tables generated

enter image description here

Here is the test code:

methods = {nasser, wizard1, wizard2, wizard3, kguler, leonid1, 
   leonid2, thies};
AppendTo[$ContextPath, "Internal`"];
ClearAll[linkedList, leonid2];
SetAttributes[linkedList, HoldAllComplete];

nasser[lst_] := Module[{m = Length[lst]},
   First@Reap[Do[Sow[lst[[1 ;; i]]], {i, 1, m}]][[2]]
   ];

wizard1[lst_] := Module[{},
   Take[lst, #] & /@ Range@Length@lst
   ];

wizard2[lst_] := Module[{},
   Table[Take[#, i], {i, Length@#}] & @lst
   ];

wizard3[lst_] := Module[{},
   Rest@FoldList[Append, {}, #] & @lst
   ];

kguler[lst_] := Module[{},
   Reverse@NestList[Most, #, Length[#] - 1] & @lst

   ];

leonid1[lst_] := Module[{b = Bag[{}]},
   Map[(StuffBag[b, #]; BagPart[b, All]) &, lst]
   ];

leonid2[lst_] := Module[{},
   Map[List @@ Flatten[#, Infinity, linkedList] &, 
    FoldList[linkedList, linkedList[First@lst], Rest@lst]]
   ];

thies[lst_] := 
  Module[{}, 
   Drop[Reverse@
     FixedPointList[If[Length[#] > 0, Most, Identity][#] &, lst], 2]
   ];

makeTable[n_, methods_] := 
  Module[{nTests = Length[methods], nTries = 5, i, j, tests, lst},
   lst = Table[RandomReal[], {n}];

   tests = Table[0, {nTests}, {nTries}];

   For[i = 1, i <= nTests, i++,
    For[j = 1, j <= nTries, j++,
      tests[[i, j]] = First@AbsoluteTiming[methods[[i]][lst] ]
     ]
    ];

   tbl = Table[{ToString[methods[[i]]], Mean[ tests[[i, All]] ]}, {i, 
      nTests}] ;

   Grid[Join[{{"method", "cpu"}}, tbl],
    Frame -> All, FrameStyle -> Directive[Thickness[.005], Gray], 
    Spacings -> {0.5, 1}
    ]
   ];

Now to run, do

makeTable[1000, methods]

Warning, do not try something over 30,000 unless you have zillion GB, else M might not return. It happened to me, and had to reboot the PC.

update 12/26/11 3:30PM

I see that Thies has a newer version of this algorithm (I called it thies2 in the methods table), so I re-run everything again, here is the updated table, I removed the linked list version since it is known in advance not to be fast for large n, and this time, I run them each for 10 times (not 5 as above) and then took the mean). I also started M fresh using factory setting (restarted it holding alt-shift keys so that all setting are back to original settings just in case)

conclusion so far

Kugler is fastest for smaller n, i.e. n<20,000. For larger n, now Thies second version is faster than Thies version 1 and now it edges ahead ever so slightly ahead of Kugler method for large n. Congratulation to Thies, the current lead in this performance test. But for all practical purposes, I would say both Thies and Kugler methods are the fastest for large n, and Kugler remain the fastest for smaller n.

Below are tables and the updated test code below them. Any one is free to run the tests for themselves, just in case I might overlooked something.

enter image description here

The current test code:

$MinPrecision = $MachinePrecision;
$MaxPrecision = $MachinePrecision;
methods = {nasser, wizard1, wizard2, wizard3, kguler, leonid, thies1, 
   thies2};
AppendTo[$ContextPath, "Internal`"];

nasser[lst_] := Module[{m = Length[lst]},
   First@Reap[Do[Sow[lst[[1 ;; i]]], {i, 1, m}]][[2]]
   ];

wizard1[lst_] := Module[{},
   Take[lst, #] & /@ Range@Length@lst
   ];

wizard2[lst_] := Module[{},
   Table[Take[#, i], {i, Length@#}] & @lst
   ];

wizard3[lst_] := Module[{},
   Rest@FoldList[Append, {}, #] & @lst
   ];

kguler[lst_] := Module[{},
   Reverse@NestList[Most, #, Length[#] - 1] & @lst

   ];

leonid[lst_] := Module[{b = Bag[{}]},
   Map[(StuffBag[b, #]; BagPart[b, All]) &, lst]
   ];

thies1[lst_] := 
  Module[{}, 
   Drop[Reverse@
     FixedPointList[If[Length[#] > 0, Most, Identity][#] &, lst], 2]
   ];

thies2[lst_] := 
  Module[{}, 
   Drop[Reverse@
     FixedPointList[If[# =!= {}, Most, Identity][#] &, lst], 2]
   ];

makeTable[n_Integer, methods_List] := 
  Module[{nTests = Length[methods], nTries = 10, i, j, tests, lst},
   lst = Table[RandomReal[], {n}];

   tests = Table[0, {nTests}, {nTries}];

   For[i = 1, i <= nTests, i++,
    For[j = 1, j <= nTries, j++,
      tests[[i, j]] = First@AbsoluteTiming[methods[[i]][lst] ]
     ]
    ];

   tbl = Table[{ToString[methods[[i]]], Mean[ tests[[i, All]] ]}, {i, 
      nTests}] ;

   Grid[Join[{{"method", "cpu"}}, tbl],
    Frame -> All, FrameStyle -> Directive[Thickness[.005], Gray], 
    Spacings -> {0.5, 1}
    ]
   ];

To run type

n=1000
makeTable[n, methods]

Thanks for everyone for their answers, I learned from all of them.

Nasser
  • 12,849
  • 6
  • 52
  • 104
  • Weird, when my Mathematica 8 is out of memory, it dies and sometimes kills Google Chrome but I don't have to reboot PC. – Nakilon Dec 26 '11 at 17:09
  • @Nakilon, I did NOT have to reboot, if I waited long enough, but I was not able to even activate the windows task manager to kill the process, as memory was so low. The whole desktop was not responsive. Virtual ram was thrashing and nothing was responsive. Easier to reboot than wait for who knows how long. M did not die, but was still running when I rebooted. – Nasser Dec 26 '11 at 20:30
  • 1
    Nasser, have you tried [this method](http://stackoverflow.com/a/7862223/618728) to prevent that? I cannot as I use v7. – Mr.Wizard Dec 26 '11 at 22:53
  • @Mr.Wizard, thanks for the link. No, I did not try that before, I did not know about it, and Yes, that is the problem I was seeing. M was taking all the ram, and the OS was left with so little ram, and unresponsive. The info in the link are useful. I should do something like that next time I need to. I am using windows 7, 64 bit, with 4 GB ram. – Nasser Dec 26 '11 at 23:30

5 Answers5

7

You can use

f = Reverse@NestList[Most, #, Length[#] - 1] &

f@{a,b,c,d,e} gives {{a}, {a, b}, {a, b, c}, {a, b, c, d}, {a, b, c, d, e}}.

An alternative using ReplaceList -- much slower than f, but ... why not?:

g = ReplaceList[#, {x__, ___} -> {x}] &
kglr
  • 1,438
  • 1
  • 9
  • 15
  • Your 'alternative' solution did not do as well as your earlier one. Slower by large amount. You could try it yourself for say n=20,000 and you'll see. I get 6.6 seconds cpu time, vs. 0.9 for your first one. – Nasser Dec 27 '11 at 10:25
  • 1
    @Nasser, this was not meant for the speed contest -- it is very slow indeed. I thought it was an interesting application of a built-in function and patterns. BTW, thank you so much for all your effort on evalaution of suggested methods. – kglr Dec 27 '11 at 10:40
4

I propose this:

runs[lst_] := Take[lst, #] & /@ Range@Length@lst

Or this:

runs2 = Table[Take[#, i], {i, Length@#}] &;

kguler's answer inspired me to write this:

Rest@FoldList[Append, {}, #] &

But this is slower than his method because of Mathematica's slow appends.

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
  • +1, nice and straight forward solution. I did not think of Take. – Nasser Dec 26 '11 at 06:48
  • @kguler which method are you referring to? your answer, which I already voted for, is over twice as fast as my `FoldList` method due to *Mathematica's* slow appends. It's also marginally faster than the `Take` methods, which I did not anticipate. – Mr.Wizard Dec 26 '11 at 07:46
  • Thank you for the vote. I was referring to `runs` b/c it is made of the simplest elements and it is a clean translation of @Nasser's question. Surprised to learn that `NestList[Most ..]` combination is faster than `runs`. BTW, you can save a character by using `lst[[;;#]]` instead of `Take[lst,#]` in `runs`. :) – kglr Dec 26 '11 at 08:18
  • @kguler Thank you. Indeed one can, but I favored Take because I think it is slightly clearer and slightly more efficient. And, if you really want to save a character, you can still use `lst~Take~#` but be prepared for flak. ;-) – Mr.Wizard Dec 26 '11 at 08:20
4

Here is another method which is roughly as efficient as the one involving Take, but uses the Internal`Bag functionality:

AppendTo[$ContextPath, "Internal`"];
runsB[lst_] :=
   Module[{b = Bag[{}]}, Map[(StuffBag[b, #]; BagPart[b, All]) &, lst]];

I don't claim that it is simpler than the one based on Take, but it seems to be a simple example of Internal`Bag at work - since this is exactly the type of problem for which these can be successfully used (and there might be cases where lists of explicit positions would either not be available or expensive to compute).

Just to compare, the solution based on linked lists:

ClearAll[linkedList, runsLL];
SetAttributes[linkedList, HoldAllComplete];
runsLL[lst_] :=
  Map[List @@ Flatten[#, Infinity, linkedList] &,
    FoldList[linkedList, linkedList[First@lst], Rest@lst]]

will be an order of magnitude slower on large lists.

Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
3

Another idea:

Inits[l_] := Drop[Reverse@FixedPointList[
               If[Length[#] > 0, Most, Identity][#] &,
               l
             ], 2];

Update:

This version is a bit faster by omitting computing the length each time:

Inits2[l_] := Drop[Reverse@FixedPointList[
                If[# =!= {}, Most, Identity][#] &,
                l
              ], 2];
Thies Heidecke
  • 2,497
  • 1
  • 23
  • 25
0

Probably not the most efficient, but another approach:

dow[lst_] :=  lst[[1 ;; #]] & /@ Range@Length@lst

For example:

dow[{a, b, c, d, ee}]

gives:

{{a}, {a, b}, {a, b, c}, {a, b, c, d}, {a, b, c, d, ee}}

681234
  • 4,214
  • 2
  • 35
  • 42