8

I'm pretty new to Mathematica and am stumped by this problem. I have a list that looks like this:

{{1, 1, 1}, {0}, {1}}

I want to replace each sublist with its first element. So, the above list should be converted to:

{1,0,1}

I've looked through the documentation repeatedly and Googled for hours. I'm sure that this is fairly simple, but I can't figure it out. I started with this list:

{1, 1, 1, 0, 1}

I need to know how many runs of 1's there are, which is obviously 2. So, I used Split to separate the list into groups of consecutive 1's and 0's. By using Length on this list I can get the total number of runs, which is 3. Now, I just need to calculate the number of runs of 1's. If I can convert the list as mentioned above, I can just sum the items in the list to get the answer.

I hope that makes sense. Thanks for any help!

Tim Mayes
  • 554
  • 3
  • 9

5 Answers5

12

The proposed solutions are pretty fast, however if you want extreme efficiency (huge lists), here is another one which would be order of magnitude faster (formulated as a pure function):

Total[Clip[Differences@#,{0, 1}]] + First[#] &

For example:

In[86]:= 
largeTestList = RandomInteger[{0,1},{10^6}];
Count[Split[largeTestList],{1..}]//Timing
Count[Split[largeTestList][[All,1]],1]//Timing
Total[Clip[Differences@#,{0, 1}]] + First[#] &@largeTestList//Timing

Out[87]= {0.328,249887}
Out[88]= {0.203,249887}
Out[89]= {0.015,249887}

EDIT

I did not indend to initiate the "big shootout", but while we are at it, let me pull the biggest gun - compilation to C:

runsOf1C = 
 Compile[{{lst, _Integer, 1}},
   Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]},
    For[i = 2, i <= Length[lst], i++,
      If[lst[[i]] == 1 && lst[[i - 1]] == 0, ctr++]];
      ctr],
  CompilationTarget -> "C", RuntimeOptions -> "Speed"]

Now,

In[157]:= 
hugeTestList=RandomInteger[{0,1},{10^7}];
Total[Clip[ListCorrelate[{-1,1},#],{0,1}]]+First[#]&@hugeTestList//AbsoluteTiming
runsOf1C[hugeTestList]//AbsoluteTiming

Out[158]= {0.1872000,2499650}
Out[159]= {0.0780000,2499650}

Of course, this is not an elegant solution, but it is straightforward.

EDIT 2

Improving on the optimization of @Sjoerd, this one will be about 1.5 faster than runsOf1C still:

runsOf1CAlt = 
Compile[{{lst, _Integer, 1}},
  Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]},
    For[i = 2, i <= Length[lst], i++,
     If[lst[[i]] == 1,
      If[lst[[i - 1]] == 0, ctr++];
      i++
     ]];
    ctr],
  CompilationTarget -> "C", RuntimeOptions -> "Speed"]
Brett Champion
  • 8,497
  • 1
  • 27
  • 44
Leonid Shifrin
  • 22,449
  • 4
  • 68
  • 100
  • 2
    Oooh, clever... You can get a bit more speed by using `Tr` instead of `Total`. (20% on my Mac.) – Brett Champion Nov 26 '11 at 04:39
  • That is impressive. So much so that I'll have to study it to figure out how it works. Fortunately, my lists won't likely exceed several thousand items. – Tim Mayes Nov 26 '11 at 07:00
  • @Brett Thanks for the hint! I keep forgetting about `Tr`, should move it to my "active" dictionary for Mathematica. – Leonid Shifrin Nov 26 '11 at 13:48
  • @TimMayes, don't feel bad, it took me some time to figure it out, also. I'd suggest looking at what `Differences` returns, first, then determining why `Clip` is needed. – rcollyer Nov 26 '11 at 14:16
  • Leonid I still think C is cheating. If we wanted to write procedural code, why would we be using Mathematica? (yes, this is officially sour grapes.) – Mr.Wizard Nov 26 '11 at 15:40
  • @Mr.Wizard Yes, I agree, this is cheating. But if someone goes as far as to be interested in playing with optimizations introduced by my, Sjoerd's or your solutions, one might as well be in need of the fastest one possible in Mathematica. The C one has an advantage that it is straightforward. It took me less than 5 minutes to write and test it, while to come up with other versions would be a lot harder - and this is counting my Mathematica experience. For most people, the C version is simpler. Won't win a beauty contest, that's for sure :) – Leonid Shifrin Nov 26 '11 at 15:47
8

You have actually two questions, the one from the title and the question lurking behind it. The first one is answered by:

First/@ list

The second one, counting the number of runs of 1's, has been answered many times, but this solution

Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &

is about 50% faster than Leonid's solution. Note I increased the length of the test list for better timing:

largeTestList = RandomInteger[{0, 1}, {10000000}];
Count[Split[largeTestList], {1 ..}] // AbsoluteTiming
Count[Split[largeTestList][[All, 1]], 1] // AbsoluteTiming
Total[Clip[Differences@#, {0, 1}]] + First[#] &@ largeTestList // AbsoluteTiming
(Tr@Unitize@Differences@# + Tr@#[[{1, -1}]])/2 &@ largeTestList // AbsoluteTiming
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
  largeTestList // AbsoluteTiming


Out[680]= {3.4361965, 2498095}

Out[681]= {2.4531403, 2498095}

Out[682]= {0.2710155, 2498095}

Out[683]= {0.2530145, 2498095}

Out[684]= {0.1710097, 2498095}

After Leonid's compilation attack I was about to throw in the towel, but I spotted a possible optimization, so onwards goes the battle... [Mr.Wizard, Leonid and I should be thrown in jail for disturbing the peace on SO]

runsOf1Cbis = 
 Compile[{{lst, _Integer, 1}}, 
  Module[{r = Table[0, {Length[lst] - 1}], i = 1, ctr = First[lst]}, 
   For[i = 2, i <= Length[lst], i++, 
    If[lst[[i]] == 1 && lst[[i - 1]] == 0, ctr++; i++]];
   ctr], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

largeTestList = RandomInteger[{0, 1}, {10000000}]; 
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
    largeTestList // AbsoluteTiming
runsOf1C[largeTestList] // AbsoluteTiming
runsOf1Cbis[largeTestList] // AbsoluteTiming


Out[869]= {0.1770101, 2500910}

Out[870]= {0.0960055, 2500910}

Out[871]= {0.0810046, 2500910}

The results vary, but I get an improvement between 10 and 30%.

The optimization may be hard to spot, but it's the extra i++ if the {0,1} test succeeds. You can't have two of these in successive locations.


And, here, an optimization of Leonid's optimization of my optimization of his optimization (I hope this isn't going to drag on, or I'm going to suffer a stack overflow):

runsOf1CDitto = 
 Compile[{{lst, _Integer, 1}}, 
  Module[{i = 1, ctr = First[lst]}, 
   For[i = 2, i <= Length[lst], i++, 
    If[lst[[i]] == 1, If[lst[[i - 1]] == 0, ctr++];
     i++]];
   ctr], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

largeTestList = RandomInteger[{0, 1}, {10000000}]; 
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@
  largeTestList // AbsoluteTiming
runsOf1C[largeTestList] // AbsoluteTiming
runsOf1Cbis[largeTestList] // AbsoluteTiming
runsOf1CAlt[largeTestList] // AbsoluteTiming
runsOf1CDitto[largeTestList] // AbsoluteTiming


Out[907]= {0.1760101, 2501382}

Out[908]= {0.0990056, 2501382}

Out[909]= {0.0780045, 2501382}

Out[910]= {0.0670038, 2501382}

Out[911]= {0.0600034, 2501382}

Lucky for me, Leonid had a superfluous initialization in his code that could be removed.

Sjoerd C. de Vries
  • 16,122
  • 3
  • 42
  • 94
  • 1
    +1. I had no idea that something can be faster than `Differences`, the latter being essentially `Rest[#]-Most[#]&`. Learned something new today :) – Leonid Shifrin Nov 26 '11 at 13:52
  • Ditto what Leonid said. I would have to get really clever to beat that. :-) (BTW, I am surprised there is not more difference between the two `Differences` methods; there is on my system.) – Mr.Wizard Nov 26 '11 at 14:34
  • 1
    Actually, your method is much slower on v7. I am going to add timings to my own post for reference. – Mr.Wizard Nov 26 '11 at 14:37
  • @Mr.wizard Interesting. My timings are consistently giving me this 50% gain. Another reason to upgrade I'd say (v9 isn't anywhere near in sight). – Sjoerd C. de Vries Nov 26 '11 at 14:46
  • On the other hand, it's slower in places too: http://stackoverflow.com/q/8243627/618728 – Mr.Wizard Nov 26 '11 at 14:55
  • @Sjoerd, Mr.Wizard I updated my answer with a compiled-to-C version. Sorry, Sjoerd :). But you still hold a record for the most elegant of the fastest solutions. – Leonid Shifrin Nov 26 '11 at 15:25
  • @Mr.Wizard, Sjoerd Oops, I did not notice Mr.Wizard's solution based on `BitXor`. Sorry for not mentioning that one. – Leonid Shifrin Nov 26 '11 at 15:27
  • @Leonid Close, but no cigar ;-) See my latest update. I hope we aren't overstaying our welcome... – Sjoerd C. de Vries Nov 26 '11 at 15:46
  • Have you tried creating a C version of my Xor method? I think it should be the fastest of all. – Mr.Wizard Nov 26 '11 at 15:52
  • @Sjoerd Don't be so sure. You also missed one optimization, check my latest update :) – Leonid Shifrin Nov 26 '11 at 16:05
  • @Mr.Wizard I tried, but it looks like `BitXor` does not compile to bytecode (or C). I did not try very hard though. Ultimately, we can write it in pure C and use the LibraryLink. In any case, I also expect that it should be the fastest. – Leonid Shifrin Nov 26 '11 at 16:07
  • @Sjoerd Damn! I've got that from some older code! Ok, you got me man, I give up for now. But, I will be back :) – Leonid Shifrin Nov 26 '11 at 16:25
  • @Leonid, Sjoerd, I don't permit either of you to give up until you've tried a Xor method in C. (grin) – Mr.Wizard Nov 26 '11 at 16:36
  • Thanks, I enjoyed this hilarious, over the top, battle-of-the-nerds. – Sjoerd C. de Vries Nov 26 '11 at 16:37
  • @Mr.Wizard Just pasting in your code in a Compile only made in take longer. Need to adapt it like Leonid's code. – Sjoerd C. de Vries Nov 26 '11 at 16:45
  • @Sjoerd like Leonid suggested, it may take direct implementation in C. But don't let that stop you... :-> – Mr.Wizard Nov 26 '11 at 16:48
  • @Brett Actually, this seems like a perfect problem for those - since we only look at two adjacent list elements at a time, it is easy to parallelize. Too bad my card isn't powerful enough to have real gains. Would be nice to measure on some new Tesla-s with 500 multiprocessors. But, OTOH, this is likely to be data-transfer bound. CUDA works best when the math-to-data-transfer ratio is high, which is not the case here. – Leonid Shifrin Nov 26 '11 at 21:22
  • @LeonidShifrin did you see this: http://weaklyreachable.blogspot.com/2007/06/mathematica-pure-function-scope-problem_2182.html ? – Sjoerd C. de Vries Nov 27 '11 at 00:51
  • @Sjoerd Oh yes, sure, I read parts of WReach's blog and found it very interesting. Regarding that particular problem, I have run into it independently several times, particularly expressed here:http://groups.google.com/group/comp.soft-sys.math.mathematica/browse_thread/thread/115b073d95999ae8. His example is exposing the same effect, but is easier to grasp. I also discussed this here: http://stackoverflow.com/questions/4920194/using-nested-slots/4922776#4922776 – Leonid Shifrin Nov 27 '11 at 00:58
7

Here is a variation of Leonid's Differences method that is slightly faster:

(Tr@Unitize@Differences@# + Tr@#[[{1,-1}]])/2 &

Compared (using Tr for both):

list = RandomInteger[1, 1*^7];

Tr[Clip[Differences@#, {0,1}]] + First[#] & @ list //timeAvg

(Tr@Unitize@Differences@# + Tr@#[[{1,-1}]])/2 & @ list //timeAvg
0.1186
0.0904

Since this has become a code efficiency competition, here is my next effort:

(Tr@BitXor[Most@#, Rest@#] + Tr@#[[{1, -1}]])/2 &

Also, I am getting very different results using Mathematica 7, so I am including them here for reference:

largeTestList = RandomInteger[{0, 1}, {10000000}];
Count[Split[largeTestList], {1 ..}] // AbsoluteTiming
Count[Split[largeTestList][[All, 1]], 1] // AbsoluteTiming
Total[Clip[Differences@#, {0, 1}]] + First[#] &@largeTestList // AbsoluteTiming
(Tr@Unitize@Differences@# + Tr@#[[{1, -1}]])/2 &@largeTestList // AbsoluteTiming
Total[Clip[ListCorrelate[{-1, 1}, #], {0, 1}]] + First[#] &@largeTestList // AbsoluteTiming
(Tr@BitXor[Most@#, Rest@#] + Tr@#[[{1, -1}]])/2 &@largeTestList // AbsoluteTiming

{1.3400766, 2499840}

{0.9670553, 2499840}

{0.1460084, 2499840}

{0.1070061, 2499840}

{0.3710213, 2499840}

{0.0480028, 2499840}
Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
6

I'd do this:

Count[Split[{1, 1, 1, 0, 1}][[All, 1]], 1]

or

Total[First /@ Split[{1, 1, 1, 0, 1}]]
Chris Degnen
  • 8,443
  • 2
  • 23
  • 40
6

Another approach, using Count to look for lists containing some number of repetitions of 1:

In[20]:= Count[Split[{1, 1, 1, 0, 1}], {1 ..}]

Out[20]= 2
Brett Champion
  • 8,497
  • 1
  • 27
  • 44
  • Thanks Brett. I swear that I tried something very close to that, but I kept getting errors on the pattern. I obviously had something wrong because your solution works. – Tim Mayes Nov 26 '11 at 06:56