4

I've got a data table with with two collumns: a number and a date, for example :

125 | 2013/10/20
100 | 2013/10/21
150 | 2013/10/24
225 | 2013/10/24
250 | 2013/10/28
310 | 2013/10/30

now, I want to search for all records ordered by date where the sum of the number matches 500. I can easily see that the first, third and fourth record (125 + 150 + 225 = 500) provide a match, but to program such, I can only think of going through the data table a zillion of times until I found the right match.

Has anybody a smarter idea?

Chris
  • 8,527
  • 10
  • 34
  • 51
Menno
  • 55
  • 4
  • possible duplicate of [How to calculate sum of a DataTable's Column in LINQ (to Dataset)?](http://stackoverflow.com/questions/550187/how-to-calculate-sum-of-a-datatables-column-in-linq-to-dataset) – Liam Oct 31 '13 at 16:04
  • How many records do you have? – Aage Oct 31 '13 at 16:05
  • @Liam - no, not really unfortunately – Menno Oct 31 '13 at 16:08
  • @bump - it could be over a thousand – Menno Oct 31 '13 at 16:08
  • possible duplicate of [Finding all possible combinations of numbers to reach a given sum](http://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum) – mbeckish Oct 31 '13 at 16:19
  • As far as I understand your (125 + 150 + 225 = 500) example, there's no date ordering involved. Actually, I'm pretty sure a performant solution would involve ordering the rows by number. – Serge Oct 31 '13 at 16:25
  • Do you mean where the sum of the number BY DATE = 500? If so you want a group by statement, your question doesn't seem to indicate any grouping however. – Evan L Oct 31 '13 at 16:28
  • 2
    This is the [Subset Sum](http://en.wikipedia.org/wiki/Subset_sum_problem) problem. – mbeckish Oct 31 '13 at 16:44
  • Are you trying to find the solution that has the earliest set of dates or all solutions? – nycdan Oct 31 '13 at 17:23
  • It depends on the constraints you have on `Number`. If you assume that `Number` is non-negative, you can put a stopping condition into your recursive search that will give you decent performance. If you have an upper bound (or a positive lower bound) on `Number` then life gets even better because you can tighten up the stopping condition and achieve better performance. – Rob Lyndon Oct 31 '13 at 21:42

1 Answers1

2

In the worst case, you do have to go through all 2^n subsets of your dataset, but if all of your items are non-negative you can start by filtering on item.Number <= 500.

Here is a possible Subsets method (actually an answer to How to get all subsets of an array?, but don't tell them):

public static IEnumerable<IEnumerable<T>> Subsets(this IEnumerable<T> source)
{
    var first = source.FirstOrDefault();
    if (first == null) return new[] { Enumerable.Empty<T>() };

    var others = source.Skip(1).Subsets();
    return others.Concat(others.Select(s => s.Concat(new { first })));
}

Once you have your Subsets method, then you can filter the result as follows, though performance is still of the order of a gazillion (or 2^n if you want to be picky).

var sets = items.Where(i => i.Number <= 500)
    .Subsets().Where(s => s.Sum(i => i.Number) == 500);

However, if you do have the constraint on Number, that it is non-negative, you can combine the Subsets operation with a search for a target sum. That would mean you would define

public static IEnumerable<IEnumerable<T>> SubsetsAddingUpTo(this IEnumerable<T> source, int target)
{
    // This stopping condition ensures that you will not have to walk the rest of the tree when you have already hit or exceeded your target.
    // It assumes that the Number values are all non-negative.
    if (target < 0) return Enumerable.Empty<IEnumerable<T>>();

    var first = source.FirstOrDefault();
    if (first == null) return Enumerable.Empty<IEnumerable<T>>();

    var tail = source.Skip(1).Where(i => i.Number <= target).ToList();

    var othersIncludingFirst = tail.SubsetsAddingUpTo(target - first.Number);
    var othersExcludingFirst = tail.SubsetsAddingUpTo(target);

    return othersExcludingFirst.Concat(othersIncludingFirst.Select(s => s.Concat(new { first })));
}

Because the check for <= target happens inside the method, you don't have to do any pre-filtering. However, you can perform a sort before you do the search, to give you your sets in a hierarchical date order. The call will be

var sets = items.OrderByDescending(i => i.Date).SubsetsAddingUpTo(500);

This should actually give you decent performance. The worst case (every item has a number of 0 or 1) won't be very good (order 2^n), but if most of the values of Number are of a similar order of magnitude to your target sum, as is the case in your example, then the stopping condition will fly in to your rescue and save you a large number of unnecessary operations.

Community
  • 1
  • 1
Rob Lyndon
  • 12,089
  • 5
  • 49
  • 74
  • A way to mitigate this might be to delete the row from the DataTable once you've used it. That way you are filtering and searching through an ever-decreasing pool of remaining rows. – SimonGoldstone Oct 31 '13 at 17:08
  • 1
    My `SubsetsAddingUpTo` method basically does that. Even so, the worst case is still `2^n`. – Rob Lyndon Oct 31 '13 at 21:36
  • If there is an upper bound on the value of `Number`, then the stopping criterion can be expanded, because then you can hit a point in the chain where you know you've got no chance of hitting your target. – Rob Lyndon Oct 31 '13 at 21:38
  • Am I acting stupid? In my test program I get following error: 'System.Data.EnumerableRowCollection' does not contain a definition for 'Subsets' and no extension method 'Subsets' accepting a first argument of type 'System.Data.EnumerableRowCollection' could be found (are you missing a using directive or an assembly reference?) – Menno Nov 04 '13 at 10:00
  • You need to define `SubsetsAddingUpTo` in its own static class. It's an extension method. Make sure you have a reference to the namespace in which you've defined `SubsetsAddingUpTo`. – Rob Lyndon Nov 04 '13 at 10:33