How to find combination of segments to build a track: possible Subset Sum

Question

There are many Subset-sum questions and answers around at SO, but somehow I can't find a solution for my specific problem.

I need to find the quantity and length of track-segments to build a track of length n. The segments are of length: 8, 10, 12, 14, 16, 18, 20, 22, 24 ft The quantity can be up to 4 segments The track length is ~ between 20 and 100 ft. (and always an even number)

This is a real track. Order of segments does not matter. However, there are preferred size combinations. All of equal length or close to each other is preferred over Large/small combinations.

i.e:

70 => 20,20,20,10 would be an easy pick, but 16,18,18,18 would be preferred
60 => 20,20,20 is better then 14,14,14,18

I can trow on more examples if required.

I'm not looking for the one best solution, but a small set of possible best fit solution. Ultimately a person will choose, this is about suggestion best options.

What I did so far is the following. It is working, just seems way to complex.

I took the basic algorithm from this post Algorithm to find which numbers from a list of size n sum to another number. All I changed in this code is turn it into integer. It return all possible combinations. up to 5 or more tracks.

To further reduce result set, I do a few Linq's

    List<int> nums = new List<int>() { 8, 8, 8, 8, 10, 10, 10, 10, 12, 12, 12, 12, 14, 14, 14, 14, 16, 16, 16, 16, 18, 18, 18, 18, 20, 20, 20, 20, 22, 22, 22, 22, 24, 24, 24, 24 };
    int width = 60;
    Console.WriteLine("Total width: " + width);
    Solver solver = new Solver();
    List<List<int>> res = solver.Solve(width, nums.ToArray());

    Console.WriteLine("total :"+res.Count);
    var res1 = res.Distinct(new SequenceComparer<List<int>, int>()).ToList();
    Console.WriteLine("total res1:" + res1.Count);

    var res2 = res1.Where(l => l.Count == 4).ToList();
    Console.WriteLine("total res2:" + res2.Count); //reduce to 4 integer solutions

    var res3 = (from row in res2
             where row[0] == row[1] || row[0] == row[2] || row[0] == row[3] ||
                   row[1] == row[2] || row[1] == row[3] ||
                   row[2] == row[3]
             select row).ToList();
    Console.WriteLine("total res3:" + res3.Count); //reduce to at least two of equal length

    var res4 = (from row in res3
            where row[0] == row[1] && row[0] == row[2] || 
                  row[0] == row[1] && row[0] == row[3] ||
                  row[1] == row[2] && row[1] == row[3] 
            select row).ToList();
     Console.WriteLine("total res4:" + res4.Count); //reduce to  three of equal length

    var res5 = (from row in res4
            where row[0] == row[1] && row[0] == row[2] && row[0] == row[3]
           select row).ToList();

     Console.WriteLine("total res5:" + res5.Count); //reduce to 4 of equal length
     Console.WriteLine("-------------------------------------");
     Console.WriteLine("res4:");
     foreach (List<int> result in res4)
     {
         foreach (int value in result)
         {
              Console.Write("{0}\t", value);
         }
         Console.WriteLine();
      }
      Console.WriteLine("-------------------------------------");
      Console.WriteLine("res5:");
      foreach (List<int> result in res5)
      {
           foreach (int value in result)
           {
               Console.Write("{0}\t", value);
           }
           Console.WriteLine();
      }
      Console.ReadLine();

This code will produce the following outcome, run with 60

    Total width: 60
    total :10726
    total res1:74
    total res2:31
    total res3:20
    total res4:3
    total res5:0
    -------------------------------------
    res4:
    12      12      12      24
    12      16      16      16
    14      14      14      18
     -------------------------------------
    res5:

or with 80 this

    Total width: 80
    total :101560
    total res1:237
    total res2:15
    total res3:13
    total res4:3
    total res5:1
    ------------------------------------
    res4:
    8       24      24      24
    14      22      22      22
    20      20      20      20
    ------------------------------------
    res5:
    20      20      20      20

So my final results (4&5) are, in fact, close to what I need.

BUT I would have to code the same again for any possible 3 track solution, and maybe 2 tracks.

Then does results would need to be compared to each other (somehow, not sure how). ALL of this makes me feel like I am missing something. It feels to complex, it feels wrong. What am I missing?

AM I using the wrong algorithm to begin with? Are their better once out their for my problem?

Suppose that we have segment of 30 as well. What would be preferable for 60, 2*30 or 3*20, or something else? — Dialecticus, Jan 16 '13 at 09:53
Alright, thanks for all the answers. I'm a bit rusty in this kinda math, been a while ...I'll start reviewing right away. — Andreas, Jan 16 '13 at 15:23
@Dialecticus : in such a case, 3*20 would be chosen. Mostly because 20 ft segments are easier to handle. However:Both should be in the final set if possible. Why? Lets take 24 and assume 6' exists: 4x6 / 3x8 / 2x12 In such a case 2x12, (maybe 3x8) would be chosen, but not 4x6. Why not 4x6. 12' handle/transport well plus larger pieces provide more structural support. The aim is to stick in the middle 12-20. — Andreas, Jan 16 '13 at 15:40

Nuclearman · Answer 1 · 2013-01-17T03:05:41.753

You really have a special case of the sum-set. While it's NP-hard, the solution space is restricted enough that brute force would probably work fine, as there are only 10000 (10 ^ 4) possible solutions total (which is about equal to the actual number of time steps needed), since you also have to consider 0 as a possible length.

Here is the code, in psudo-Python. Thought about trying it in C#, but not actually familiar with it, and so it probably wouldn't work out well.

lengths = [0, 8, 10, 12, 14, 16, 18, 20, 22, 24]
solutions = []
for len1 in lengths:
    for len2 in lengths:
        for len3 in lengths:
            for len4 in lengths:
                if (len1 + len2 + len3 + len4) == n:
                    solutions.append([len1,len2,len3,len4])
return solutions

Once you have all the valid solutions, you can then simply determine which one(s) you want to show the user, or you can simply write an algorithm to pick the best one. Of course, you'll probably want to not actually include any lengths of size 0.

You can improve this algorithm somewhat using a greedy method that will only find all valid solutions. However, again, the problem as it's stated isn't likely to be complex enough to need it unless things are very constricted in terms of space or time.

As a bonus, if you are expecting multiple queries (for example user asks for n = 40 and later n = 50), you can remove the if statement and simply store all 10000 solutions in a hash-table keyed to the sum value, allowing for O(1) querying.

Narrowing down the solution set:

What you need here is a comparison algorithm that basically compares this solution to that solution and says, "this solution is better/worst than that solution". This allows you to write a sorting algorithm that you can use to sort the solutions to get the best so many, or you can simply just find the best solution.

You can resolve the vast majority of cases by simply calculating the standard deviation for each solution then compare the standard deviations. This will give a number that shows how much variance is in the lengths of the solution. If you use "lower standard deviation is better", then that'll give you "All of equal length or close to each other is preferred over Large/small combinations". The algorithm for standard deviation is fairly straightforward, so I'll leave it to you try to implement. Actually, there's a good chance that C# has the function built in. Just be sure not to include any zero lengths, actually they should probably be removed before you add them to the solution set to avoid issues, which requires a bit of tweaking to the code I gave.

However, you then have the tricky part, dealing with cases where different solutions have the same standard deviation. I'm thinking there are only two cases where this happens.

The first occurs only there are multiple ideal solutions. Such as if n = 24, then three of the solutions will be [8,8,8], [12,12], and [24].
The second occurs due to the brute force nature of the algorithm, and is why there are so many solutions. This is because for every solution like [8,10,12,14] (four unique lengths) there are 24 ways to arrange those lengths, such as [14,12,10,8] and [12,10,14,8]. So the best way to improve upon the brute force algorithm is to have an algorithm that multichooses 4 elements from [0, 8, 10, 12, 14, 16, 18, 20, 22, 24]. This narrows the solution set to only 715 solutions. Of course, if you actually want [8,10,12,14], [14,12,10,8] and [12,10,14,8] as different solutions, then there isn't much you can do.

The above two paragraphs fall squarely in the realm of "it depends". You'll have to decide what rules the algorithm should follow in each case, but I'm thinking those are the only two cases where you could find identical standard deviations.

@M C: I like it, mostly because I understand it. Codded it up, added code for 2 and 3 piece solutions and use it as algorithm for now. However, I'm not closer on how to minimize the solution set. So I'm not going to mark this as answer yet — Andreas, Jan 16 '13 at 16:16
If I may, and I by no means have checked nor understand all the math you throw at me here. That last algo, the updated version. It rejects any doubles. If I code it up, it returns no solutions for 32, none for 40 or 60. I'm not sure what it is you tried to archive with the !=, but I am actually looking specifically for as equal as possible solutions. — Andreas, Jan 17 '13 at 01:54
The standard deviation hint is just brilliant, Yes Sir, Thank you. The smaller the SD, the closer together the segment lengths. Coded up a comparer, using it. — Andreas, Jan 17 '13 at 01:56
Sorry sorry, seems I'm having a bad day. You're actually looking for multichoose, not simply choose (combination). Sadly, I'm not sure how you'd go about implementing that efficiently. I've also cleaned up my comments that are no longer valid, as well as cleaning up my answer a bit as I still had some bits leftover from a previous version of the answer. — Nuclearman, Jan 17 '13 at 03:06
no problem, I pointed it out mostly so others don't use and wonder why it fails. While I still like your brute force solution, I have to say I use Eran's algo below, simply because it is flexible in qty of segments. However, I use your standard deviation as second sort after the ZeroNorm sort and I get what I need — Andreas, Jan 17 '13 at 16:30
I'm not sure which one to select as answer doh. I'm using more code from @Eran directly, but I really use both in combination. DO I just pick one, or should I post a new answer, with a summary of what I ended up doing? — Andreas, Jan 17 '13 at 16:35
Eran's is fine for the answer, brute force solutions are limited to special cases, which is the only reason I suggested using it here. — Nuclearman, Jan 17 '13 at 22:46
Eran's is fine for the answer, brute force solutions are limited to special cases, which is the only reason I suggested using it here. Actually, I'm going to probably look more into Eran's answer, as it might be useful in other places. — Nuclearman, Jan 17 '13 at 23:11

score 2 · Answer 2 · answered Jan 16 '13 at 14:33

2

Let's divide everything by 2 since everything is even. We now have track pieces with length 4 to 12 for a total length of about 10 to 50.

Name n the length we have to achieve. For every possible number k of track pieces (1 to 4 in general, but 1 to 3 for n<16 or 3 to 4 for n>36 for example), suggest taking n%k pieces of length n/k+1 and k-n%k pieces of length n/k.

'/' designates integer division and '%' the remainder.

answered Jan 16 '13 at 14:33

Khaur

770
3
10

Obviously this only works if we have a contiguous range of lengths, but I haven't seen any evidence of the contrary. – Khaur Jan 16 '13 at 14:37
ah, this takes me back to advanced algorithms... the flashbacks, the pain ..hehe. Sorry, ! couldn't help myself. Thanx for your answer, I need to code this out first in order to make sense of it – Andreas Jan 16 '13 at 16:46

Eran · Accepted Answer · 2013-01-17T09:36:16.217

Math to the rescue!

You can check that every even number larger than 8 is a linear combination of elements from this set - ask on Math Overflow for a proof ;).

So let's rephrase the question in math:

We have an overcomplete dictionary of basis vectors (because 16, for example, is a multiple of 8),
in which we can represent every even number larger than 8 as a linear combination of these basis vectors, and
we are trying to minimize the zero "norm" of this input vector.

The good news: This is a very interesting problem, with many application domains, so it is pretty well researched.

The bad news: this is still a hard (NP-hard) problem.

But, hey, at least now you know.

edit: And just so I don't get accused of handwaving philosophical answer, here's a modified (completely non-optimized) version of Solver.recursiveSolve that exhaustively searches for a combination of segments that match the goal; and a zero norm comparer class with which you can sort your results:

    private void RecursiveSolve(int goal, int currentSum,
        List<int> included, List<int> segments, int startIndex)
    {

        for (int index = 0; index < segments.Count; index++)
        {
            int nextValue = segments[index];
            if (currentSum + nextValue == goal)
            {
                List<int> newResult = new List<int>(included);
                newResult.Add(nextValue);
                mResults.Add(newResult);
            }
            else if (currentSum + nextValue < goal)
            {
                List<int> nextIncluded = new List<int>(included);
                nextIncluded.Add(nextValue);
                RecursiveSolve(goal, currentSum + nextValue,
                    nextIncluded, segments, startIndex++);
            }
        }
    }

class ZeroNormAndSDComparer : IComparer<List<int>>
{
    private int l0_norm(List<int> v)
    {
        int norm = 0;
        HashSet<int> seen = new HashSet<int>();

        for (int i = 0; i < v.Count; ++i)
        {
            if (!seen.Contains(v[i]))
            {
                norm++;
                seen.Add(v[i]);
            }
        }
        return norm;
    }

    private static double StandardDeviation(List<int> v)
    {
        double M = 0.0;
        double S = 0.0;
        int k = 1;
        foreach (double value in v)
        {
            double tmpM = M;
            M += (value - tmpM) / k;
            S += (value - tmpM) * (value - M);
            k++;
        }
        return Math.Sqrt(S / (k - 1));
    }

    public int Compare(List<int> x, List<int> y)
    {
        int normComparison = l0_norm(x).CompareTo(l0_norm(y));
        if  (normComparison == 0)
        {
            return StandardDeviation(x).CompareTo(StandardDeviation(y));
        }
        return normComparison;
    }
}

And your modified Main (now sorting is done after results have been reduced to distinct, 4-segment results):

        List<int> nums = new List<int>() { 8, 10, 12, 14, 16, 18, 20, 22, 24 };

        int width = 80;
        Console.WriteLine("Total width: " + width);

        Solver solver = new Solver();
        List<List<int>> res = solver.Solve(width, nums.ToArray());
        Console.WriteLine("total :" + res.Count);
        var res1 = res.Distinct(new SequenceComparer<List<int>, int>()).ToList();
        Console.WriteLine("total res1:" + res1.Count);

        var res2 = res1.Where(l => l.Count == 4).ToList();
        Console.WriteLine("total res2:" + res2.Count); //reduce to 4 integer solutions

        res2.Sort(new ZeroNormAndSDComparer());

Nice answer but from what I read what we want to minimize is not the l0 pseudo-norm (see first example). — Khaur, Jan 16 '13 at 14:21
@Khaur, true. I also notice I didn't take into account the maximum 4 segments requirements. In my defense I'll say that the code Andreas supplied actually just checks the l0 norm. — Eran, Jan 16 '13 at 14:39
for the 4 segment requirement you can add a constraint on the l1 norm as well. I haven't read your code in detail so I don't know if this is easy to do or if constraints will mess it up, but conceptually it is simple. — Khaur, Jan 16 '13 at 14:44
I haven't gone over this solution yet, I get to it. In the mean time, mind explaining what this 'l0 norm' is/means you're talking about? Or is that the Zero Norm shortcut? — Andreas, Jan 16 '13 at 16:32
@Eran: I finally got to look / code up your solution. Let me put it this way. I only have a 4 year undergrad in comp-sci/Math. With some sugar-coding, I understand about half of what is discussed here. That Zero-Normal.. I'll just take your word for it. Point being: This is a very good answer. I am in no position to judge your algorithm against the "brute force" solution suggested by @M C, I will keep both at hand, I have other similar problem they will find a use. — Andreas, Jan 17 '13 at 02:54
Don't feel bad Andreas, I can't follow it either :-(. Perhaps not enough linear algebra. — Nuclearman, Jan 17 '13 at 03:09
Regarding the outcome of the solution, after sorting I get all the 'all equal' solutions first, perfect. If I filter out solutions where n>4, (using goal=40), I get [8,24,8] listed before [10,20,10] or [24,8,8] before [14,14,12]. Is this expected behavior? I can fix it if I have to using my SD sorter. I'm just wondering why it list all equals first, and then later lists sets with higher SD before other with lower SD. — Andreas, Jan 17 '13 at 03:14
@Andreas the zero "norm" / l0 pseudo norm is simply the number of different components in the solution, so all-equals are listed first (their norm value is 1), and [8,24,8] and [10,20,10] have norm 1 so the comparer considers them the same. Their order in the sorted list depends on the order in which they're found by the exhaustive search and the List sort function implementation (which I believe is unstable). I have not taken into account the SD and 4-segment results requirements in my original solution, but I have now edited my answer to reflect them. — Eran, Jan 17 '13 at 09:37
@Eran Thanx for the further explanation. I didn't mean to force you into further editing your solution as I already added the SD sorting and filtering myself. But, as I was wondering if I should post a answer summarizing my solution using your and @M C answers, to clear it up I now don't have to. I'm going to accept your answer. To be fair, M C algo works just fine for a small, fixed set, and he suggested the Standard Deviation Sort first. Thank you all. — Andreas, Jan 17 '13 at 21:30
@Andreas: It probably wouldn't hurt to post you're own solution for comparison. — Nuclearman, Jan 17 '13 at 23:13

How to find combination of segments to build a track: possible Subset Sum

3 Answers3