14

I came across an old problem that you Mathematica/StackOverflow folks will probably like and that seems valuable to have on StackOverflow for posterity.

Suppose you have a list of lists and you want to pick one element from each and put them in a new list so that the number of elements that are identical to their next neighbor is maximized. In other words, for the resulting list l, minimize Length@Split[l]. In yet other words, we want the list with the fewest interruptions of identical contiguous elements.

For example:

pick[{ {1,2,3}, {2,3}, {1}, {1,3,4}, {4,1} }]
 --> {    2,      2,    1,     1,      1   }

(Or {3,3,1,1,1} is equally good.)

Here's a preposterously brute force solution:

pick[x_] := argMax[-Length@Split[#]&, Tuples[x]]

where argMax is as described here:
posmax: like argmax but gives the position(s) of the element x for which f[x] is maximal

Can you come up with something better? The legendary Carl Woll nailed this for me and I'll reveal his solution in a week.

Community
  • 1
  • 1
dreeves
  • 26,430
  • 45
  • 154
  • 229
  • Someone added the puzzle tag, which is fine, but I should mention that this was a perfectly real problem I had and for which solutions might be useful for others in the future. – dreeves Apr 13 '11 at 07:01
  • Can I assume that Carl Woll did not post this to MathGroup? – Mr.Wizard Apr 13 '11 at 07:33
  • For which purposes such a function could be useful? – Alexey Popkov Apr 13 '11 at 08:04
  • @Mr.Wizard It is indeed on MathGroup and more power to you if you want to dig it up and post it! Unless you want to treat it as a puzzle in which case I guess that would be cheating. I don't know, I just wanted to record the problem and solutions here for posterity. So much better than MathGroup! – dreeves Apr 13 '11 at 08:08
  • @Alexey I had piecewise functions, autogenerated, with an insane number of pieces that needed to be merged to minimize the number of pieces. Something like that. It's in chapter 2 of my phd thesis: http://dreev.es/thesis – dreeves Apr 13 '11 at 08:19
  • @dreeves I think the method you are asking for is probably not optimal in the sense of memory consumption. – Alexey Popkov Apr 13 '11 at 08:39
  • 2
    I agree with the motivation to transcribe solutions from MathGroup to StackOverflow, as they are more easily found, and later extended or refined. – Mr.Wizard Apr 13 '11 at 08:48
  • by the way, should I take this challenge, is {3,3,1,1,1} a valid solution? – Mr.Wizard Apr 13 '11 at 08:56
  • @Mr.Wizard and @Sjoerd: Yes, I'll point that out in the question. Thanks! – dreeves Apr 13 '11 at 17:25
  • @dreeves A very nice problem to work on. Thanks, and thanks for giving us a week to play with it. – DavidC Apr 16 '11 at 11:59
  • Hah! I figured it out! Or at least I have a mma one liner that is superfast. – Timo Apr 16 '11 at 22:00

8 Answers8

4

Not an answer, but a comparison of the methods proposed here. I generated test sets with a variable number of subsets this number varying from 5 to 100. Each test set was generated with this code

Table[RandomSample[Range[10], RandomInteger[{1, 7}]], {rl}]

with rl the number of subsets involved.

For every test set that was generated this way I had all the algorithms do their thing. I did this 10 times (with the same test set) with the algorithms operating in a random order so as to level out order effects and the effects of random background processes on my laptop. This results in mean timing for the given data set. The above line was used 20 times for each rl length, from which a mean (of means) and a standard deviation were calculated.

The results are below (horizontally the number of subsets and vertically the mean AbsoluteTiming):

enter image description here

It seems that Mr.Wizard is the (not so clear) winner. Congrats!


Update
As requested by Timo here the timings as a function of the number of distinct subset elements that can be chosen from as well as the maximum number of elements in each subset. The data sets are generated for a fixed number of subsets (50) according to this line of code:

lst = Table[RandomSample[Range[ch], RandomInteger[{1, ch}]], {50}];

I also increased the number of datasets I tried for each value from 20 to 40.

enter image description here


Here for 5 subsets:

enter image description here

Sjoerd C. de Vries
  • 16,122
  • 3
  • 42
  • 94
  • @dreeves @Mr.Wizard @Sasha @Timo @David Carrahar In case you're not following this thread anymore: full scale timing results are in! – Sjoerd C. de Vries Apr 20 '11 at 12:47
  • 1
    Nice work! Care to make a second run where you increase the length of the sublists and the number of disparate elements instead/in addition to of the number of lists for a fixed number of lists? – Timo Apr 20 '11 at 14:44
  • @Timo Done. I'll see whether I can come up with a few more. – Sjoerd C. de Vries Apr 20 '11 at 15:20
  • @Sjoerd Nice. Especially the way your and David's routine switch places and the way Mr.W's mine and Carl's routine are essentially flat wrt sublist length. – Timo Apr 20 '11 at 16:00
  • Thanks for the graphs! May I add my own? – Mr.Wizard Apr 20 '11 at 17:32
  • 1
    By the way, the @ notifications do not work the way you think. See: http://meta.stackexchange.com/questions/43019/how-do-comment-replies-work/43020#43020 – Mr.Wizard Apr 20 '11 at 17:50
  • 1
    @Mr.Wizard Thanks! I'll edit some 'Kick me' notes in your next few texts ;-) – Sjoerd C. de Vries Apr 20 '11 at 21:44
  • @Sjoerd, this is awesome. I was thinking of making this the accepted answer but maybe in that case it should actually include what seems to you the best answer at the top. That way people in the future who search for this can grab something and go without reading through all the answers and comparisons here, awesome as it is. – dreeves Apr 24 '11 at 17:13
  • @dreeves. Thanks, but I'd say that would too much honor. Making these graphs was a nice learning experience as they show that superficially similar algorithms may have quite different scaling behaviors. But the work is not on equal footing with the work on the algorithms itself. – Sjoerd C. de Vries Apr 25 '11 at 07:58
  • @Sjoerd Very nicely done. It's great to have this sort of comparison. – DavidC Apr 29 '11 at 00:09
3

I'll toss this into the ring. I am not certain it always gives an optimal solution, but it appears to work on the same logic as some other answers given, and it is fast.

f@{} := (Sow[m]; m = {i, 1})
f@x_ := m = {x, m[[2]] + 1}

findruns[lst_] :=
  Reap[m = {{}, 0}; f[m[[1]] ⋂ i] ~Do~ {i, lst}; Sow@m][[2, 1, 2 ;;]]

findruns gives run-length-encoded output, including parallel answers. If output as strictly specified is required, use:

Flatten[First[#]~ConstantArray~#2 & @@@ #] &

Here is a variation using Fold. It is faster on some set shapes, but a little slower on others.

f2[{}, m_, i_] := (Sow[m]; {i, 1})
f2[x_, m_, _] := {x, m[[2]] + 1}

findruns2[lst_] :=
  Reap[Sow@Fold[f2[#[[1]] ⋂ #2, ##] &, {{}, 0}, lst]][[2, 1, 2 ;;]]
Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
  • +1 It worked fine with my 30 sublist example. You might want to place an @ at the end of your `Flatten`...code. Now to play a bit with `Reap` and `Sow` to see what they do. – DavidC Apr 16 '11 at 17:23
  • @David, yes, this seems to work, but I am not convinced that it will *always* give the optimal solution. It is significantly faster than the other answers I tried however, should it prove valid. --- I don't know what you mean regarding `@` and `Flatten`; care to try explaining? --- Sow and Reap are very useful. – Mr.Wizard Apr 16 '11 at 17:39
  • Addendum: I read Sjoerd's explanation now, and he seems to have used the same logic, so maybe this is a good answer after all! I should clean it up a little in that case. – Mr.Wizard Apr 16 '11 at 17:42
  • 1
    +1 Your solution is the fastest. It's 2 orders of magnitude faster than mine and about 23 times faster that Sjoerd's. The test case used `BlockRandom[SeedRandom[11]; lst = Table[RandomSample[Range[80], RandomInteger[{55, 60}]], {72}];]` – Sasha Apr 19 '11 at 04:56
2

My solution is based on the observation that 'greed is good' here. If I have the choice between interrupting a chain and beginning a new, potentially long chain, picking the new one to continue doesn't do me any good. The new chain gets longer with the same amount as the old chain gets shorter.

So, what the algorithm basically does is starting at the first sublist and for each of its members finding the number of additional sublists that have the same member and choosing the sublist member that has the most neighboring twins. This process then continues at the sublist at the end of this first chain and so on.

So combining this in a recursive algorithm we end up with:

pickPath[lst_] :=
 Module[{lengthChoices, bestElement},
  lengthChoices = 
   LengthWhile[lst, Function[{lstMember}, MemberQ[lstMember, #]]] & /@First[lst];
  bestElement = Ordering[lengthChoices][[-1]];
  If[ Length[lst] == lengthChoices[[bestElement]],
   ConstantArray[lst[[1, bestElement]], lengthChoices[[bestElement]]],
   {
    ConstantArray[lst[[1, bestElement]], lengthChoices[[bestElement]]],
    pickPath[lst[[lengthChoices[[bestElement]] + 1 ;; -1]]]
    }
   ]
  ]

Test

In[12]:= lst = 
 Table[RandomSample[Range[10], RandomInteger[{1, 7}]], {8}]

Out[12]= {{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 2, 8, 5, 
  9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 7}, {6, 9, 
  4, 5}}

In[13]:= pickPath[lst] // Flatten // AbsoluteTiming

Out[13]= {0.0020001, {10, 10, 10, 10, 1, 9, 9, 9}}

Dreeves' Brute Force approach

argMax[f_, dom_List] := 
Module[{g}, g[e___] := g[e] = f[e];(*memoize*) dom[[Ordering[g /@ dom, -1]]]]
pick[x_] := argMax[-Length@Split[#] &, Tuples[x]]

In[14]:= pick[lst] // AbsoluteTiming

Out[14]= {0.7340420, {{10, 10, 10, 10, 1, 9, 9, 9}}}

The first time I used a slightly longer test list. The brute force approach brought my computer to a virtual standstill, claiming all the memory it had. Pretty bad. I had to restart after 10 minutes. Restarting took me another quarter, due to the PC becoming extremely non-responsive.

Sjoerd C. de Vries
  • 16,122
  • 3
  • 42
  • 94
  • I feel your pain at the restart - had to do that a fair bit last year. Mathematica is not very graceful when it runs out of memory... – Simon Apr 13 '11 at 10:47
  • It's hungry too. I've used up to around 42GB of RAM with mma... (then our sysadmin mailed me saying "did you really mean to do that?" and I killed it) – acl Apr 13 '11 at 21:01
2

This is my take on it, and does pretty much the same thing as Sjoerd, just in a less amount of code.

LongestRuns[list_List] := 
 Block[{gr, f = Intersection}, 
  ReplaceRepeated[
    list, {a___gr, Longest[e__List] /; f[e] =!= {}, b___} :> {a, 
      gr[e], b}] /. 
   gr[e__] :> ConstantArray[First[f[e]], Length[{e}]]]

Some gallery:

In[497]:= LongestRuns[{{1, 2, 3}, {2, 3}, {1}, {1, 3, 4}, {4, 1}}]

Out[497]= {{2, 2}, {1, 1, 1}}

In[498]:= LongestRuns[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 
   2, 8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 
   7}, {6, 9, 4, 5}}]

Out[498]= {{3, 3, 3, 3}, {1}, {9, 9, 9}}

In[499]:= pickPath[{{3, 10, 6}, {8, 2, 10, 5, 9, 3, 6}, {3, 7, 10, 2, 
   8, 5, 9}, {6, 9, 1, 8, 3, 10}, {1}, {2, 9, 4}, {9, 5, 2, 6, 8, 
   7}, {6, 9, 4, 5}}]

Out[499]= {{10, 10, 10, 10}, {{1}, {9, 9, 9}}}

In[500]:= LongestRuns[{{2, 8}, {4, 2}, {3}, {9, 4, 6, 8, 2}, {5}, {8, 
   10, 6, 2, 3}, {9, 4, 6, 3, 10, 1}, {9}}]

Out[500]= {{2, 2}, {3}, {2}, {5}, {3, 3}, {9}}

In[501]:= LongestRuns[{{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12, 
   3, 15}, {17, 6, 13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6, 
   14}, {5, 17, 9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1, 
   12, 2}, {10, 4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3,
    4, 9}, {11, 13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18, 
   6}, {17, 19, 9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12, 
   8}, {18, 16, 14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16, 
   2, 6, 20, 1, 3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20,
    10, 12, 9, 11}, {10, 12, 6, 19, 17, 5}}]

Out[501]= {{4}, {1}, {3, 3}, {1}, {5, 5}, {13, 13}, {1}, {4}, {9, 9, 
  9}, {1}, {7, 7}, {9}, {12, 12, 12}, {14}, {2, 2}, {3, 3}, {12, 12, 
  12, 12}}

EDIT given that Sjoerd's Dreeves's brute force approach fails on large samples due to inability to generate all Tuples at once, here is another brute force approach:

bfBestPick[e_List] := Block[{splits, gr, f = Intersection},
  splits[{}] = {{}};
  splits[list_List] := 
   ReplaceList[
    list, {a___gr, el__List /; f[el] =!= {}, 
      b___} :> (Join[{a, gr[el]}, #] & /@ splits[{b}])]; 
  Module[{sp = 
     Cases[splits[
        e] //. {seq__gr, 
         re__List} :> (Join[{seq}, #] & /@ {re}), {__gr}, Infinity]}, 
   sp[[First@Ordering[Length /@ sp, 1]]] /. 
    gr[args__] :> ConstantArray[First[f[args]], Length[{args}]]]]

This brute-force-best-pick might generate different splitting, but it is length that matters according to the original question.

test = {{4, 6, 18, 15}, {1, 20, 16, 7, 14, 2, 9}, {12, 3, 15}, {17, 6,
     13, 10, 3, 19}, {1, 15, 2, 19}, {5, 17, 3, 6, 14}, {5, 17, 
    9}, {15, 9, 19, 13, 8, 20}, {18, 13, 5}, {11, 5, 1, 12, 2}, {10, 
    4, 7}, {1, 2, 14, 9, 12, 3}, {9, 5, 19, 8}, {14, 1, 3, 4, 9}, {11,
     13, 5, 1}, {16, 3, 7, 12, 14, 9}, {7, 4, 17, 18, 6}, {17, 19, 
    9}, {7, 15, 3, 12}, {19, 12, 5, 14, 8}, {1, 10, 12, 8}, {18, 16, 
    14, 19}, {2, 7, 10}, {19, 2, 5, 3}, {16, 17, 3}, {16, 2, 6, 20, 1,
     3}, {12, 18, 11, 19, 17}, {12, 16, 9, 20, 4}, {19, 20, 10, 12, 9,
     11}, {10, 12, 6, 19, 17, 5}};

pick fails on this example.

In[637]:= Length[bfBestPick[test]] // Timing

Out[637]= {58.407, 17}

In[638]:= Length[LongestRuns[test]] // Timing

Out[638]= {0., 17}

In[639]:= 
Length[Cases[pickPath[test], {__Integer}, Infinity]] // Timing

Out[639]= {0., 17}

I am posting this in case somebody might want to search for counterexamples that the code like pickPath or LongestRuns does indeed generate a sequence with smallest number of interruptions.

Sasha
  • 5,935
  • 1
  • 25
  • 33
  • +1 Nice approach. It took me some time to let it sink in (it's a candidate for the job-security award), but it has several nice ideas. The dual use of a head to mark sequences that have been processed and as a post-processing tool is just great. 'Intersection' to find the lonest run is a cool idea too. – Sjoerd C. de Vries Apr 13 '11 at 16:49
  • The second test in my answer with the bruce force method, is Dreeves' method not mine. He deserves to have the credit for it ;-) – Sjoerd C. de Vries Apr 13 '11 at 17:10
2

So here is my "one liner" with improvements by Mr.Wizard:

 pickPath[lst_List] :=
 Module[{M = Fold[{#2, #} &, {{}}, Reverse@lst]},
   Reap[While[M != {{}},
      Do[Sow@#[[-2,1]], {Length@# - 1}] &@
       NestWhileList[# ⋂ First[M = Last@M] &, M[[1]], # != {} &]
   ]][[2, 1]]
 ]

It basically uses intersection repeatedly on consecutive lists until it comes up empty, and then does it again and again. In a humongous torture test case with

M = Table[RandomSample[Range[1000], RandomInteger[{1, 200}]], {1000}];

I get Timing[] consistently around 0.032 on my 2GHz Core 2 Duo.


Below this point is my first attempt, which I'll leave for your perusal.

For a given list of lists of elements M we count the different elements and the number of lists, list the different elements in canonical order, and construct a matrix K[i,j] detailing the presence of element i in list j:

elements = Length@(Union @@ M);
lists = Length@M;
eList = Union @@ M;
positions = Flatten@Table[{i, Sequence @@ First@Position[eList, M[[i,j]]} -> 1,
                          {i, lists},
                          {j, Length@M[[i]]}];
K = Transpose@Normal@SparseArray@positions;

The problem is now equivalent to traversing this matrix from left to right, by only stepping on 1's, and changing rows as few times as possible.

To achieve this I Sort the rows, take the one with the most consecutive 1's at the start, keep track of what element I picked, Drop that many columns from K and repeat:

R = {};
While[Length@K[[1]] > 0,
   len = LengthWhile[K[[row = Last@Ordering@K]], # == 1 &];
   Do[AppendTo[R, eList[[row]]], {len}];
   K = Drop[#, len] & /@ K;
]

This has an AbsoluteTiming of approximately three times that of Sjoerd's approach.

Community
  • 1
  • 1
Timo
  • 4,246
  • 6
  • 29
  • 42
  • @Timo Would you provide an example? I figured out that M is the input list (of lists) but I didn't quickly see how to use the second batch of code. – DavidC Apr 16 '11 at 17:31
  • Ah fudge! I copy pasted code in the middle of making improvements so `eList` is not defined. I'll post a correction and hopefully the second batch of code will start to make sense. – Timo Apr 16 '11 at 19:44
  • Timo, your new method will be must faster on long sets if you use linked lists. Basically: start with `M = Fold[{#2, #} &, {}, Reverse@M]` and change `Rest` to `Last`. – Mr.Wizard Apr 17 '11 at 16:00
  • That doesn't straight up work since you need to also insert the result of the Intersection back into the linked list. Thanks for the suggestion though, I'll try to get it working and compare timings. Currently my Timing grows approximately like n log n, with n being number of lists and elements per list. – Timo Apr 17 '11 at 16:23
  • Timo, it is easier to show you than explain. May I edit your post, or should I put it in mine? – Mr.Wizard Apr 17 '11 at 23:58
  • @Mr.W. Feel free to append your version of the code in my post. – Timo Apr 18 '11 at 04:29
  • 1
    After some testing, I get up to an order of magnitude faster times when using your test list and a still decent 20% improvement for the torture list I am using. Thanks for the suggestion, now I have to wade through all my old projects and see where I can apply this improvement ;-). – Timo Apr 19 '11 at 06:42
  • Mathematica is really slow to change the length of long lists (actually arrays, I believe). It doesn't make much difference if lists are short. When creating a long list I either use Sow & Reap, a fixed length array created with `ConstantArray` and then filled with `Part` assignments, or linked lists. If reading rather than creating a long list (as in the case of `M` above) it is better to step through the list with an index (rather than using a `M = Drop[M]` since the latter changes the list length each time) or to pre-convert to a linked list. – Mr.Wizard Apr 19 '11 at 15:24
  • Yeah I noticed all the speedup was due to the replacement of `Rest` to `Last`. – Timo Apr 19 '11 at 17:17
2

Here's a go at it...

runsByN: For each number, show whether it appears or not in each sublist

list= {{4, 2, 7, 5, 1, 9, 10}, {10, 1, 8, 3, 2, 7}, {9, 2, 7, 3, 6, 4,  5}, {10, 3, 6, 4, 8, 7}, {7}, {3, 1, 8, 2, 4, 7, 10, 6}, {7, 6}, {10, 2, 8, 5, 6, 9, 7, 3}, {1, 4, 8}, {5, 6, 1}, {3, 2, 1}, {10,6, 4}, {10, 7, 3}, {10, 2, 4}, {1, 3, 5, 9, 7, 4, 2, 8}, {7, 1, 3}, {5, 7, 1, 10, 2, 3, 6, 8}, {10, 8, 3, 6, 9, 4, 5, 7}, {3, 10, 5}, {1}, {7, 9, 1, 6, 2, 4}, {9, 7, 6, 2}, {5, 6, 9, 7}, {1, 5}, {1,9, 7, 5, 4}, {5, 4, 9, 3, 1, 7, 6, 8}, {6}, {10}, {6}, {7, 9}};
runsByN = Transpose[Table[If[MemberQ[#, n], n, 0], {n, Max[list]}] & /@ list]
Out = {{1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0,1, 1, 1, 0, 0, 0, 0}, {2, 2, 2, 0, 0, 2, 0, 2, 0, 0, 2, 0, 0, 2, 2,0, 2, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0}, {0, 3, 3, 3, 0, 3, 0,3, 0, 0, 3, 0, 3, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0,0}, {4, 0, 4, 4, 0, 4, 0, 0, 4, 0, 0, 4, 0, 4, 4, 0, 0, 4, 0, 0, 4, 0, 0, 0, 4, 4, 0, 0, 0, 0}, {5, 0, 5, 0, 0, 0, 0, 5, 0, 5, 0, 0, 0, 0, 5, 0, 5, 5, 5, 0, 0, 0, 5, 5, 5, 5, 0, 0, 0, 0}, {0, 0, 6, 6, 0, 6, 6, 6, 0, 6, 0, 6, 0, 0, 0, 0, 6, 6, 0, 0, 6, 6, 6, 0, 0, 6, 6, 0,6, 0}, {7, 7, 7, 7, 7, 7, 7, 7, 0, 0, 0, 0, 7, 0, 7, 7, 7, 7, 0, 0, 7, 7, 7, 0, 7, 7, 0, 0, 0, 7}, {0, 8, 0, 8, 0, 8, 0, 8, 8, 0, 0, 0, 0, 0, 8, 0, 8, 8, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 0}, {9, 0, 9, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 9, 0, 0, 9, 0, 0, 9, 9, 9, 0, 9, 9, 0, 0, 0, 9}, {10, 10, 0, 10, 0, 10, 0, 10, 0, 0, 0, 10, 10, 10, 0, 0, 10, 10, 10, 0, 0, 0, 0, 0, 0, 0, 0, 10, 0, 0}};

runsByN is list transposed, with zeros inserted to represent missing numbers. It shows the sublists in which 1, 2, 3, and 4 appeared.


myPick: Picking numbers that constitute an optimal path

myPick recursively builds a list of the longest runs. It doesn't look for all optimal solutions, but rather the first solution of minimal length.

myPick[{}, c_] := Flatten[c]
myPick[l_, c_: {}] := 
   Module[{r = Length /@ (l /. {x___, 0, ___} :> {x}), m}, m = Max[r];
   myPick[Cases[(Drop[#, m]) & /@ l, Except[{}]], 
   Append[c, Table[Position[r, m, 1, 1][[1, 1]], {m}]]]]
choices = myPick[runsByN]
(* Out= {7, 7, 7, 7, 7, 7, 7, 7, 1, 1, 1, 10, 10, 10, 3, 3, 3, 3, 3, 1, 1, 6, 6, 1, 1, 1, 6, 10, 6, 7} *)

Thanks to Mr.Wizard for suggesting the use of a replacement rule as an efficient alternative to TakeWhile.


Epilog:Visualizing the solution path

runsPlot[choices1_, runsN_] := 
  Module[{runs = {First[#], Length[#]} & /@ Split[choices1], myArrow,
          m = Max[runsN]},
  myArrow[runs1_] :=
     Module[{data1 = Reverse@First[runs1], data2 = Reverse[runs1[[2]]],
      deltaX},
      deltaX := data2[[1]] - 1;
      myA[{}, _, out_] := out;           
      myA[inL_, deltaX_, outL_] :=
        Module[{data3 = outL[[-1, 1, 2]]},
        myA[Drop[inL, 1], inL[[1, 2]] - 1, 
          Append[outL, Arrow[{{First[data3] + deltaX, 
           data3[[2]]}, {First[data3] + deltaX + 1, inL[[1, 1]]}}]]]];
        myA[Drop[runs1, 2], deltaX, {Thickness[.005], 
            Arrow[{data1, {First[data1] + 1, data2[[2]]}}]}]];

  ListPlot[runsN,
     Epilog -> myArrow[runs],
     PlotStyle -> PointSize[Large],
     Frame -> True,
     PlotRange -> {{1, Length[choices1]}, {1, m}},
     FrameTicks -> {All, Range[m]},
     PlotRangePadding -> .5,
     FrameLabel -> {"Sublist", "Number", "Sublist", "Number"},
     GridLines :>    {FoldList[Plus, 0, Length /@ Split[choices1]], None}
   ]];

runsPlot[choices, runsByN]

The chart below represents the data from list. Each plotted point corresponds to a number and the sublist in which it occurred.

Runs by n

Community
  • 1
  • 1
DavidC
  • 3,056
  • 1
  • 20
  • 30
  • @Mr.Wizard Apparently I'm still more comfortable with TakeWhile. But after everything is up and running ok, I'll revisit the efficiency issue. – DavidC Apr 17 '11 at 00:21
  • 1
    @David, of course, pardon me. In retrospect that comment comes across rudely. For what it's worth I tried the `/.` approach and it makes your code 4X faster. – Mr.Wizard Apr 17 '11 at 00:27
  • @Mr. No, that's quite ok. Thanks for the suggestion, which I have now implemented. – DavidC Apr 17 '11 at 01:33
  • Anyway, 20 edits makes it automatically become a community wiki. – DavidC Apr 17 '11 at 02:33
  • @Mr. @David Been there, done that :) http://stackoverflow.com/questions/3265986/an-algorithm-to-space-out-overlapping-rectangles/3279877#3279877 – Dr. belisarius Apr 29 '11 at 17:12
1

Could use integer linear programming. Here is code for that.

bestPick[lists_] := Module[
  {picks, span, diffs, v, dv, vars, diffvars, fvars,
    c1, c2, c3, c4, constraints, obj, res},
  span = Max[lists] - Min[lists];
  vars = MapIndexed[v[Sequence @@ #2] &, lists, {2}];
  picks = Total[vars*lists, {2}];
  diffs = Differences[picks];
  diffvars = Array[dv, Length[diffs]];
  fvars = Flatten[{vars, diffvars}];
  c1 = Map[Total[#] == 1 &, vars];
  c2 = Map[0 <= # <= 1 &, fvars];
  c3 = Thread[span*diffvars >= diffs];
  c4 = Thread[span*diffvars >= -diffs];
  constraints = Join[c1, c2, c3, c4];
  obj = Total[diffvars];
  res = Minimize[{obj, constraints}, fvars, Integers];
  {res[[1]], Flatten[vars*lists /. res[[2]] /. 0 :> Sequence[]]}
 ]

Your example:

lists = {{1, 2, 3}, {2, 3}, {1}, {1, 3, 4}, {4, 1}}

bestPick[lists]

Out[88]= {1, {2, 2, 1, 1, 1}}

For larger problems Minimize might run into trouble since it uses exact methods for solving relaxed LPs. In which case you might need to switch to NMinimize, and change the domain argument to a constraint of the form Element[fvars,Integers].

Daniel Lichtblau

Daniel Lichtblau
  • 6,854
  • 1
  • 23
  • 30
  • Note to self: Carl's method is hugely better than this. – Daniel Lichtblau Apr 13 '11 at 16:32
  • It is about three orders of magnitude slower than Sasha's or my solution when tested on {{4, 9, 10, 8, 7, 5}, {7, 3, 6, 4, 9, 1}, {2, 4}, {1, 5, 10, 9, 4, 8}, {4, 3, 9}, {1, 4, 5, 3, 6, 2, 8}, {5, 1, 4, 2, 3}, {2, 6, 8, 5, 9}, {1, 3, 4, 6, 10, 5}, {8, 9, 7}} – Sjoerd C. de Vries Apr 13 '11 at 17:16
  • One OoM (6.9 sec vs .052 sec on my desktop) if you use NMinimize as I mentioned. But yeah, ILP is not the way to go about this. If one stores two elements per list it might be a different matter... – Daniel Lichtblau Apr 13 '11 at 18:35
  • I tried the 30 sublist example in my response, but the program seems to have locked up. – DavidC Apr 16 '11 at 17:17
1

A week is up! Here is the fabled solution from Carl Woll. (I tried to get him to post it himself. Carl, if you come across this and want to take official credit, just paste it in as a separate answer and I'll delete this one!)

pick[data_] := Module[{common,tmp}, 
  common = {};
  tmp = Reverse[If[(common = Intersection[common,#])=={}, common = #, common]& /@
                data];
  common = .;
  Reverse[If[MemberQ[#, common], common, common = First[#]]& /@ tmp]]

Still quoting Carl:

Basically, you start at the beginning, and find the element which gives you the longest string of common elements. Once the string can no longer be extended, start a new string. It seems to me that this algorithm ought to give you a correct answer (there are many correct answers).

dreeves
  • 26,430
  • 45
  • 154
  • 229